[Met_help] [rt.rap.ucar.edu #77555] History for pb2nc run-time speed question
John Halley Gotway via RT
met_help at ucar.edu
Fri Sep 2 12:10:44 MDT 2016
----------------------------------------------------------------
Initial Request
----------------------------------------------------------------
Dear MET help,
I have noticed that pb2nc runs *very* slow for us, typically taking at least 2+ hours to process a single day of prepbufr .nr files from the RDA archive server.
Am I not running pb2nc correctly, or does this program typically run very slow like this?
If it's the latter, has the MET team thought about ways to maximize run-time performance with the pb2nc program?
Thanks much,
Jon
----------------------------------------------------------------
Complete Ticket History
----------------------------------------------------------------
Subject: pb2nc run-time speed question
From: John Halley Gotway
Time: Mon Aug 15 09:50:33 2016
Hi Jon,
Can you point me to the specific RDA dataset you're processing? Also,
can
you please send me your PB2NC configuration file?
I'll try running it here and report back to you on runtime.
Thanks,
John
On Mon, Aug 15, 2016 at 7:30 AM, Case, Jonathan[ENSCO INC] via RT <
met_help at ucar.edu> wrote:
>
> Mon Aug 15 07:30:56 2016: Request 77555 was acted upon.
> Transaction: Ticket created by jonathan.case-1 at nasa.gov
> Queue: met_help
> Subject: pb2nc run-time speed question
> Owner: Nobody
> Requestors: jonathan.case-1 at nasa.gov
> Status: new
> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
>
>
> Dear MET help,
>
> I have noticed that pb2nc runs *very* slow for us, typically taking
at
> least 2+ hours to process a single day of prepbufr .nr files from
the RDA
> archive server.
>
> Am I not running pb2nc correctly, or does this program typically run
very
> slow like this?
> If it's the latter, has the MET team thought about ways to maximize
> run-time performance with the pb2nc program?
>
> Thanks much,
> Jon
>
>
>
------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #77555] pb2nc run-time speed question
From: Case, Jonathan[ENSCO INC]
Time: Mon Aug 15 10:40:40 2016
Hi John,
I'm still in Africa, so I'll try to get you a sample tomorrow. We're 9
hours ahead of you!
JonC
Sent from my iPhone
> On Aug 15, 2016, at 7:00 PM, John Halley Gotway via RT
<met_help at ucar.edu> wrote:
>
> Hi Jon,
>
> Can you point me to the specific RDA dataset you're processing?
Also, can
> you please send me your PB2NC configuration file?
>
> I'll try running it here and report back to you on runtime.
>
> Thanks,
> John
>
> On Mon, Aug 15, 2016 at 7:30 AM, Case, Jonathan[ENSCO INC] via RT <
> met_help at ucar.edu> wrote:
>
>>
>> Mon Aug 15 07:30:56 2016: Request 77555 was acted upon.
>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>> Queue: met_help
>> Subject: pb2nc run-time speed question
>> Owner: Nobody
>> Requestors: jonathan.case-1 at nasa.gov
>> Status: new
>> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
>>
>>
>> Dear MET help,
>>
>> I have noticed that pb2nc runs *very* slow for us, typically taking
at
>> least 2+ hours to process a single day of prepbufr .nr files from
the RDA
>> archive server.
>>
>> Am I not running pb2nc correctly, or does this program typically
run very
>> slow like this?
>> If it's the latter, has the MET team thought about ways to maximize
>> run-time performance with the pb2nc program?
>>
>> Thanks much,
>> Jon
>
------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #77555] pb2nc run-time speed question
From: Case, Jonathan[ENSCO INC]
Time: Mon Aug 15 10:41:30 2016
I do know that it's the 337.0 prepbufr 4x daily .nr files.
JonC
Sent from my iPhone
> On Aug 15, 2016, at 7:00 PM, John Halley Gotway via RT
<met_help at ucar.edu> wrote:
>
> Hi Jon,
>
> Can you point me to the specific RDA dataset you're processing?
Also, can
> you please send me your PB2NC configuration file?
>
> I'll try running it here and report back to you on runtime.
>
> Thanks,
> John
>
> On Mon, Aug 15, 2016 at 7:30 AM, Case, Jonathan[ENSCO INC] via RT <
> met_help at ucar.edu> wrote:
>
>>
>> Mon Aug 15 07:30:56 2016: Request 77555 was acted upon.
>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>> Queue: met_help
>> Subject: pb2nc run-time speed question
>> Owner: Nobody
>> Requestors: jonathan.case-1 at nasa.gov
>> Status: new
>> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
>>
>>
>> Dear MET help,
>>
>> I have noticed that pb2nc runs *very* slow for us, typically taking
at
>> least 2+ hours to process a single day of prepbufr .nr files from
the RDA
>> archive server.
>>
>> Am I not running pb2nc correctly, or does this program typically
run very
>> slow like this?
>> If it's the latter, has the MET team thought about ways to maximize
>> run-time performance with the pb2nc program?
>>
>> Thanks much,
>> Jon
>
------------------------------------------------
Subject: pb2nc run-time speed question
From: John Halley Gotway
Time: Mon Aug 15 15:59:58 2016
Jon,
I pulled data for August 7th, 2016 and processed each of the 4 daily
files
using the default PB2NC configuration file.
Each one took between 4.5 and 5 minutes to run on my desktop machine.
Granted, I have a pretty beefy machine with 3.5GHz processors and 16GB
of
memory. But it sounds like your runs are taking about 6 times longer
than
mine.
I do know that PB2NC is memory intensive, so it's likely that you're
using
all the available memory and switching over into swap space which is
extremely slow. The problem is that NetCDF can only have 1 unlimited
dimension, when we'd really like to have 2... one for the unique
headers
and a second for the observations themselves. So it stores all that
header
info in memory until it processes all the observations and then writes
out
the headers at the end.
We just released MET version 5.2 today and will begin working on 6.0.
One
major 6.0 upgrade will be switching to using NetCDF4, which does allow
the
use of multiple unlimited dimensions. Hopefully that will solve this
memory consumption issue and make PB2NC run faster.
I'll make a note about this to make sure we switch to 2 unlimited
dimensions for NetCDF4.
Thanks,
John
On Mon, Aug 15, 2016 at 10:41 AM, Case, Jonathan[ENSCO INC] via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
>
> I do know that it's the 337.0 prepbufr 4x daily .nr files.
> JonC
>
> Sent from my iPhone
>
> > On Aug 15, 2016, at 7:00 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
> >
> > Hi Jon,
> >
> > Can you point me to the specific RDA dataset you're processing?
Also,
> can
> > you please send me your PB2NC configuration file?
> >
> > I'll try running it here and report back to you on runtime.
> >
> > Thanks,
> > John
> >
> > On Mon, Aug 15, 2016 at 7:30 AM, Case, Jonathan[ENSCO INC] via RT
<
> > met_help at ucar.edu> wrote:
> >
> >>
> >> Mon Aug 15 07:30:56 2016: Request 77555 was acted upon.
> >> Transaction: Ticket created by jonathan.case-1 at nasa.gov
> >> Queue: met_help
> >> Subject: pb2nc run-time speed question
> >> Owner: Nobody
> >> Requestors: jonathan.case-1 at nasa.gov
> >> Status: new
> >> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
> >>
> >>
> >> Dear MET help,
> >>
> >> I have noticed that pb2nc runs *very* slow for us, typically
taking at
> >> least 2+ hours to process a single day of prepbufr .nr files from
the
> RDA
> >> archive server.
> >>
> >> Am I not running pb2nc correctly, or does this program typically
run
> very
> >> slow like this?
> >> If it's the latter, has the MET team thought about ways to
maximize
> >> run-time performance with the pb2nc program?
> >>
> >> Thanks much,
> >> Jon
> >
>
>
>
------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #77555] pb2nc run-time speed question
From: Case, Jonathan[ENSCO INC]
Time: Tue Aug 16 01:06:16 2016
Hi John H-G,
I noticed that on our beefed up cluster head node, pb2nc took ~21 min
to run 12 hours worth of data.
I have it set up to output individual hourly times into separate .nc
files (for both sfc and upa), because I've noticed that pointstat runs
much faster if we subset the data into hourly bins, especially for
larger domains with numerous obs.
Anyhow, for the end-users here in Africa, they've been running on
instances or images that they make by specifying system resource
requests off an actual machine or workstation. I'm not savvy as to
how they do this, but I suspect that if you run too many programs on
these instances, then pb2nc will swap memory and run much slower.
For example, yesterday, we were running both the WRF model and pb2nc
at the same time on a 4-proc instance. I imagine that running WRF
opposite pb2nc led to some serious memory swapping!
Anyhow, for the record, I uploaded some sample .nr files, my
PB2NCConfig file, and a sample sfc output file.
The only thing I see that we're doing differently, which could lead to
longer processing time, is that our time window is +/- 3 hours (10800
s) whereas yours is +/- 1.5 hours (5400 s).
I'll do some tests with +/-5400 instead of +/-10800. Since we're
outputting sfc/upa files into hourly files, we shouldn't need such a
large window I figure.
Thanks for the insight,
JonC
________________________________________
From: John Halley Gotway via RT [met_help at ucar.edu]
Sent: Monday, August 15, 2016 4:59 PM
To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
Subject: Re: [rt.rap.ucar.edu #77555] pb2nc run-time speed question
Jon,
I pulled data for August 7th, 2016 and processed each of the 4 daily
files
using the default PB2NC configuration file.
Each one took between 4.5 and 5 minutes to run on my desktop machine.
Granted, I have a pretty beefy machine with 3.5GHz processors and 16GB
of
memory. But it sounds like your runs are taking about 6 times longer
than
mine.
I do know that PB2NC is memory intensive, so it's likely that you're
using
all the available memory and switching over into swap space which is
extremely slow. The problem is that NetCDF can only have 1 unlimited
dimension, when we'd really like to have 2... one for the unique
headers
and a second for the observations themselves. So it stores all that
header
info in memory until it processes all the observations and then writes
out
the headers at the end.
We just released MET version 5.2 today and will begin working on 6.0.
One
major 6.0 upgrade will be switching to using NetCDF4, which does allow
the
use of multiple unlimited dimensions. Hopefully that will solve this
memory consumption issue and make PB2NC run faster.
I'll make a note about this to make sure we switch to 2 unlimited
dimensions for NetCDF4.
Thanks,
John
On Mon, Aug 15, 2016 at 10:41 AM, Case, Jonathan[ENSCO INC] via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
>
> I do know that it's the 337.0 prepbufr 4x daily .nr files.
> JonC
>
> Sent from my iPhone
>
> > On Aug 15, 2016, at 7:00 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
> >
> > Hi Jon,
> >
> > Can you point me to the specific RDA dataset you're processing?
Also,
> can
> > you please send me your PB2NC configuration file?
> >
> > I'll try running it here and report back to you on runtime.
> >
> > Thanks,
> > John
> >
> > On Mon, Aug 15, 2016 at 7:30 AM, Case, Jonathan[ENSCO INC] via RT
<
> > met_help at ucar.edu> wrote:
> >
> >>
> >> Mon Aug 15 07:30:56 2016: Request 77555 was acted upon.
> >> Transaction: Ticket created by jonathan.case-1 at nasa.gov
> >> Queue: met_help
> >> Subject: pb2nc run-time speed question
> >> Owner: Nobody
> >> Requestors: jonathan.case-1 at nasa.gov
> >> Status: new
> >> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
> >>
> >>
> >> Dear MET help,
> >>
> >> I have noticed that pb2nc runs *very* slow for us, typically
taking at
> >> least 2+ hours to process a single day of prepbufr .nr files from
the
> RDA
> >> archive server.
> >>
> >> Am I not running pb2nc correctly, or does this program typically
run
> very
> >> slow like this?
> >> If it's the latter, has the MET team thought about ways to
maximize
> >> run-time performance with the pb2nc program?
> >>
> >> Thanks much,
> >> Jon
> >
>
>
>
------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #77555] pb2nc run-time speed question
From: Case, Jonathan[ENSCO INC]
Time: Tue Aug 16 01:07:40 2016
(sorry forgot to include link to the ftp folder:
ftp://geo.msfc.nasa.gov/SPoRT/modeling/wrf/met/forJHG/)
-JonC
________________________________________
From: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
Sent: Tuesday, August 16, 2016 2:06 AM
To: met_help at ucar.edu
Subject: RE: [rt.rap.ucar.edu #77555] pb2nc run-time speed question
Hi John H-G,
I noticed that on our beefed up cluster head node, pb2nc took ~21 min
to run 12 hours worth of data.
I have it set up to output individual hourly times into separate .nc
files (for both sfc and upa), because I've noticed that pointstat runs
much faster if we subset the data into hourly bins, especially for
larger domains with numerous obs.
Anyhow, for the end-users here in Africa, they've been running on
instances or images that they make by specifying system resource
requests off an actual machine or workstation. I'm not savvy as to
how they do this, but I suspect that if you run too many programs on
these instances, then pb2nc will swap memory and run much slower.
For example, yesterday, we were running both the WRF model and pb2nc
at the same time on a 4-proc instance. I imagine that running WRF
opposite pb2nc led to some serious memory swapping!
Anyhow, for the record, I uploaded some sample .nr files, my
PB2NCConfig file, and a sample sfc output file.
The only thing I see that we're doing differently, which could lead to
longer processing time, is that our time window is +/- 3 hours (10800
s) whereas yours is +/- 1.5 hours (5400 s).
I'll do some tests with +/-5400 instead of +/-10800. Since we're
outputting sfc/upa files into hourly files, we shouldn't need such a
large window I figure.
Thanks for the insight,
JonC
________________________________________
From: John Halley Gotway via RT [met_help at ucar.edu]
Sent: Monday, August 15, 2016 4:59 PM
To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
Subject: Re: [rt.rap.ucar.edu #77555] pb2nc run-time speed question
Jon,
I pulled data for August 7th, 2016 and processed each of the 4 daily
files
using the default PB2NC configuration file.
Each one took between 4.5 and 5 minutes to run on my desktop machine.
Granted, I have a pretty beefy machine with 3.5GHz processors and 16GB
of
memory. But it sounds like your runs are taking about 6 times longer
than
mine.
I do know that PB2NC is memory intensive, so it's likely that you're
using
all the available memory and switching over into swap space which is
extremely slow. The problem is that NetCDF can only have 1 unlimited
dimension, when we'd really like to have 2... one for the unique
headers
and a second for the observations themselves. So it stores all that
header
info in memory until it processes all the observations and then writes
out
the headers at the end.
We just released MET version 5.2 today and will begin working on 6.0.
One
major 6.0 upgrade will be switching to using NetCDF4, which does allow
the
use of multiple unlimited dimensions. Hopefully that will solve this
memory consumption issue and make PB2NC run faster.
I'll make a note about this to make sure we switch to 2 unlimited
dimensions for NetCDF4.
Thanks,
John
On Mon, Aug 15, 2016 at 10:41 AM, Case, Jonathan[ENSCO INC] via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
>
> I do know that it's the 337.0 prepbufr 4x daily .nr files.
> JonC
>
> Sent from my iPhone
>
> > On Aug 15, 2016, at 7:00 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
> >
> > Hi Jon,
> >
> > Can you point me to the specific RDA dataset you're processing?
Also,
> can
> > you please send me your PB2NC configuration file?
> >
> > I'll try running it here and report back to you on runtime.
> >
> > Thanks,
> > John
> >
> > On Mon, Aug 15, 2016 at 7:30 AM, Case, Jonathan[ENSCO INC] via RT
<
> > met_help at ucar.edu> wrote:
> >
> >>
> >> Mon Aug 15 07:30:56 2016: Request 77555 was acted upon.
> >> Transaction: Ticket created by jonathan.case-1 at nasa.gov
> >> Queue: met_help
> >> Subject: pb2nc run-time speed question
> >> Owner: Nobody
> >> Requestors: jonathan.case-1 at nasa.gov
> >> Status: new
> >> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
> >>
> >>
> >> Dear MET help,
> >>
> >> I have noticed that pb2nc runs *very* slow for us, typically
taking at
> >> least 2+ hours to process a single day of prepbufr .nr files from
the
> RDA
> >> archive server.
> >>
> >> Am I not running pb2nc correctly, or does this program typically
run
> very
> >> slow like this?
> >> If it's the latter, has the MET team thought about ways to
maximize
> >> run-time performance with the pb2nc program?
> >>
> >> Thanks much,
> >> Jon
> >
>
>
>
------------------------------------------------
Subject: pb2nc run-time speed question
From: John Halley Gotway
Time: Tue Aug 16 09:54:17 2016
Jon,
I agree, running PB2NC and WRF on a 4-processor machine will likely be
problematic.
I see that you're already using the mask.poly option to define a
smaller
retention area. Can you check to see how many lat/lon points are in
that
file:
/raid1/sport/people/casejl/MET/configFiles/USER.poly
Setting mask.poly *should* be speeding things up since PB2NC would
have
less header info to store in memory. However if USER.poly contains
many,
many points, the checking to see if each observation lat/lon is inside
that
polyline could be slow. From that perspective, mask.grid would be
faster... it just converts the observation lat/lon to grid x/y and
checks
to see if that x/y falls on the grid or not. But if USER.poly only
has 4
points in it, then that'd be very similar to using mask.grid.
So if USER.poly has many points, try just using 4 instead or switch to
using mask.grid. But if USER.poly only has a handful of points, then
this
likely isn't an issue.
I'll be interested to see how using multiple unlimited dimensions in
NetCD4
affects all of this.
John
On Tue, Aug 16, 2016 at 1:07 AM, Case, Jonathan[ENSCO INC] via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
>
> (sorry forgot to include link to the ftp folder:
> ftp://geo.msfc.nasa.gov/SPoRT/modeling/wrf/met/forJHG/)
> -JonC
> ________________________________________
> From: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
> Sent: Tuesday, August 16, 2016 2:06 AM
> To: met_help at ucar.edu
> Subject: RE: [rt.rap.ucar.edu #77555] pb2nc run-time speed question
>
> Hi John H-G,
>
> I noticed that on our beefed up cluster head node, pb2nc took ~21
min to
> run 12 hours worth of data.
> I have it set up to output individual hourly times into separate .nc
files
> (for both sfc and upa), because I've noticed that pointstat runs
much
> faster if we subset the data into hourly bins, especially for larger
> domains with numerous obs.
>
> Anyhow, for the end-users here in Africa, they've been running on
> instances or images that they make by specifying system resource
requests
> off an actual machine or workstation. I'm not savvy as to how they
do
> this, but I suspect that if you run too many programs on these
instances,
> then pb2nc will swap memory and run much slower.
> For example, yesterday, we were running both the WRF model and pb2nc
at
> the same time on a 4-proc instance. I imagine that running WRF
opposite
> pb2nc led to some serious memory swapping!
>
> Anyhow, for the record, I uploaded some sample .nr files, my
PB2NCConfig
> file, and a sample sfc output file.
> The only thing I see that we're doing differently, which could lead
to
> longer processing time, is that our time window is +/- 3 hours
(10800 s)
> whereas yours is +/- 1.5 hours (5400 s).
>
> I'll do some tests with +/-5400 instead of +/-10800. Since we're
> outputting sfc/upa files into hourly files, we shouldn't need such a
large
> window I figure.
>
> Thanks for the insight,
> JonC
> ________________________________________
> From: John Halley Gotway via RT [met_help at ucar.edu]
> Sent: Monday, August 15, 2016 4:59 PM
> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
> Subject: Re: [rt.rap.ucar.edu #77555] pb2nc run-time speed question
>
> Jon,
>
> I pulled data for August 7th, 2016 and processed each of the 4 daily
files
> using the default PB2NC configuration file.
>
> Each one took between 4.5 and 5 minutes to run on my desktop
machine.
> Granted, I have a pretty beefy machine with 3.5GHz processors and
16GB of
> memory. But it sounds like your runs are taking about 6 times
longer than
> mine.
>
> I do know that PB2NC is memory intensive, so it's likely that you're
using
> all the available memory and switching over into swap space which is
> extremely slow. The problem is that NetCDF can only have 1
unlimited
> dimension, when we'd really like to have 2... one for the unique
headers
> and a second for the observations themselves. So it stores all that
header
> info in memory until it processes all the observations and then
writes out
> the headers at the end.
>
> We just released MET version 5.2 today and will begin working on
6.0. One
> major 6.0 upgrade will be switching to using NetCDF4, which does
allow the
> use of multiple unlimited dimensions. Hopefully that will solve
this
> memory consumption issue and make PB2NC run faster.
>
> I'll make a note about this to make sure we switch to 2 unlimited
> dimensions for NetCDF4.
>
> Thanks,
> John
>
> On Mon, Aug 15, 2016 at 10:41 AM, Case, Jonathan[ENSCO INC] via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555 >
> >
> > I do know that it's the 337.0 prepbufr 4x daily .nr files.
> > JonC
> >
> > Sent from my iPhone
> >
> > > On Aug 15, 2016, at 7:00 PM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> > >
> > > Hi Jon,
> > >
> > > Can you point me to the specific RDA dataset you're processing?
Also,
> > can
> > > you please send me your PB2NC configuration file?
> > >
> > > I'll try running it here and report back to you on runtime.
> > >
> > > Thanks,
> > > John
> > >
> > > On Mon, Aug 15, 2016 at 7:30 AM, Case, Jonathan[ENSCO INC] via
RT <
> > > met_help at ucar.edu> wrote:
> > >
> > >>
> > >> Mon Aug 15 07:30:56 2016: Request 77555 was acted upon.
> > >> Transaction: Ticket created by jonathan.case-1 at nasa.gov
> > >> Queue: met_help
> > >> Subject: pb2nc run-time speed question
> > >> Owner: Nobody
> > >> Requestors: jonathan.case-1 at nasa.gov
> > >> Status: new
> > >> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=77555
> >
> > >>
> > >>
> > >> Dear MET help,
> > >>
> > >> I have noticed that pb2nc runs *very* slow for us, typically
taking at
> > >> least 2+ hours to process a single day of prepbufr .nr files
from the
> > RDA
> > >> archive server.
> > >>
> > >> Am I not running pb2nc correctly, or does this program
typically run
> > very
> > >> slow like this?
> > >> If it's the latter, has the MET team thought about ways to
maximize
> > >> run-time performance with the pb2nc program?
> > >>
> > >> Thanks much,
> > >> Jon
> > >
> >
> >
> >
>
>
>
>
------------------------------------------------
More information about the Met_help
mailing list