[Met_help] [rt.rap.ucar.edu #85541] History for Thoughts on memory usage while automating verification
John Halley Gotway via RT
met_help at ucar.edu
Mon Jul 9 17:08:56 MDT 2018
----------------------------------------------------------------
Initial Request
----------------------------------------------------------------
Good afternoon,
My team and I are trying to figure out a way to optimize our real-time
verification processing. We have run into an issue (I believe) as a result
of the size of the masks combined with how many threads I want to run. This
is only an issue when running CONUS verification due to the amount of masks
we have to run (all CWAs, Regions, RFCs, CONUS). It turns out to be 133 max
masks. It appears that MET takes all of the masks and stores them in memory
for the duration of the computation, is that correct? If so:
Example CONUS NDFD verification run:
1 lead time verification = 133 masks at ~35MB each ~ 4.5GB RAM per lead time
The issue is that our machine only has 16GB RAM available, so I can only
run 3 threads at a time with this current configuration. With verification
run-time at approximately 7 minutes per lead time, real-time verification
processing time is going to soar if I can't run more threads at a time.
We have 16 core available and I'm trying to use everything I can to speed
this up in preparations for expanding this to other weather elements in the
future when we centralize and modernize MDL's verification efforts.
Is there anything we can do to optimize? Sharing mask RAM across threads?
Compress the masks? We could also run threads with less masks, but that
doesn't seem like it would help too much in terms of total time.
Any thoughts would be super helpful!
Thank you,
Dana
--
Dana Strom
Technical Lead, AceInfo Solutions
Meteorological Development Laboratory (NWS)
Phone: 301-427-9451
----------------------------------------------------------------
Complete Ticket History
----------------------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: John Halley Gotway
Time: Thu Jun 14 12:02:06 2018
Dana,
Great question. I don't have an obvious solution for you right now
but I
do have two ideas for how we might enhance MET to better facilitate
this.
Here's the first one...
Essentially, you're applying 3 different types of masks: CWA, RFC, and
CONUS. For each of these types, each grid point can be assigned a
single
mask value:
- CONUS: 0 or 1
- RFC: 0 through 12
- CWA: 0 through 122 (or so)
For each of these 3, we could run gen_vx_mask iteratively to assign an
integer to each grid point to define where it belongs. That'd give us
3
gridded data files to define the masks.
In the MET config files, right now we can apply "data masking", which
means... read a data field and apply a threshold. We could pass in
these 3
files and apply many thresholds like "==1" and "==2", and so on.
As of met-7.0, this would still result in 133 gridded masks being
defined
and the same memory usage. The suggestion is that we define a more
concise
way of doing this. Ideally, the MET tools would read these 3 fields
and
define verification masks for each unique value >0 that it finds in
the
data.
This would definitely require a change to the code but may be
possible.
One detail that's important is figuring out what name to assign to
each
mask value to populate the VX_MASK column in the output.
Here's the second one...
We've been a bit lazy in the code, defining the masking regions using
the
DataPlane class in MET, which stores double precision values. Really
all
we need to store is true/false. So we could switch the data type for
storing masks to consume much, much less memory in storing the 133
masks...
and then keep the exact same setup you're already using.
What do you think of these options?
Thanks,
John
On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov via RT <
met_help at ucar.edu> wrote:
>
> Thu Jun 14 11:08:51 2018: Request 85541 was acted upon.
> Transaction: Ticket created by dana.strom at noaa.gov
> Queue: met_help
> Subject: Thoughts on memory usage while automating verification
> Owner: Nobody
> Requestors: dana.strom at noaa.gov
> Status: new
> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
>
>
> Good afternoon,
>
> My team and I are trying to figure out a way to optimize our real-
time
> verification processing. We have run into an issue (I believe) as a
result
> of the size of the masks combined with how many threads I want to
run. This
> is only an issue when running CONUS verification due to the amount
of masks
> we have to run (all CWAs, Regions, RFCs, CONUS). It turns out to be
133 max
> masks. It appears that MET takes all of the masks and stores them in
memory
> for the duration of the computation, is that correct? If so:
>
>
> Example CONUS NDFD verification run:
>
> 1 lead time verification = 133 masks at ~35MB each ~ 4.5GB RAM per
lead
> time
>
>
> The issue is that our machine only has 16GB RAM available, so I can
only
> run 3 threads at a time with this current configuration. With
verification
> run-time at approximately 7 minutes per lead time, real-time
verification
> processing time is going to soar if I can't run more threads at a
time.
>
> We have 16 core available and I'm trying to use everything I can to
speed
> this up in preparations for expanding this to other weather elements
in the
> future when we centralize and modernize MDL's verification efforts.
>
> Is there anything we can do to optimize? Sharing mask RAM across
threads?
> Compress the masks? We could also run threads with less masks, but
that
> doesn't seem like it would help too much in terms of total time.
>
> Any thoughts would be super helpful!
>
> Thank you,
> Dana
>
> --
> Dana Strom
>
> Technical Lead, AceInfo Solutions
>
> Meteorological Development Laboratory (NWS)
>
> Phone: 301-427-9451
>
>
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: dana.strom at noaa.gov
Time: Thu Jun 14 13:30:56 2018
John,
Those sound like two great options! It seems, based on your answers,
the
one that would take less time / development on your end is option 2.
I'm
fine with having the same setup I have with non-float values for each
grid
point, and it seems like it won't be too hard of a change on your end
to
look for the true/false vs float. Am I correct? That being said,
whichever
solution you choose will suffice!
Let me know which one you run with and I can adapt to whatever
solution you
need me to.
Thank you as always for your quick response and your help!
Dana
On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:
> Dana,
>
> Great question. I don't have an obvious solution for you right now
but I
> do have two ideas for how we might enhance MET to better facilitate
this.
>
> Here's the first one...
>
> Essentially, you're applying 3 different types of masks: CWA, RFC,
and
> CONUS. For each of these types, each grid point can be assigned a
single
> mask value:
> - CONUS: 0 or 1
> - RFC: 0 through 12
> - CWA: 0 through 122 (or so)
>
> For each of these 3, we could run gen_vx_mask iteratively to assign
an
> integer to each grid point to define where it belongs. That'd give
us 3
> gridded data files to define the masks.
>
> In the MET config files, right now we can apply "data masking",
which
> means... read a data field and apply a threshold. We could pass in
these 3
> files and apply many thresholds like "==1" and "==2", and so on.
>
> As of met-7.0, this would still result in 133 gridded masks being
defined
> and the same memory usage. The suggestion is that we define a more
concise
> way of doing this. Ideally, the MET tools would read these 3 fields
and
> define verification masks for each unique value >0 that it finds in
the
> data.
>
> This would definitely require a change to the code but may be
possible.
> One detail that's important is figuring out what name to assign to
each
> mask value to populate the VX_MASK column in the output.
>
> Here's the second one...
>
> We've been a bit lazy in the code, defining the masking regions
using the
> DataPlane class in MET, which stores double precision values.
Really all
> we need to store is true/false. So we could switch the data type
for
> storing masks to consume much, much less memory in storing the 133
masks...
> and then keep the exact same setup you're already using.
>
> What do you think of these options?
>
> Thanks,
> John
>
> On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov via RT <
> met_help at ucar.edu> wrote:
>
> >
> > Thu Jun 14 11:08:51 2018: Request 85541 was acted upon.
> > Transaction: Ticket created by dana.strom at noaa.gov
> > Queue: met_help
> > Subject: Thoughts on memory usage while automating
verification
> > Owner: Nobody
> > Requestors: dana.strom at noaa.gov
> > Status: new
> > Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> >
> >
> > Good afternoon,
> >
> > My team and I are trying to figure out a way to optimize our real-
time
> > verification processing. We have run into an issue (I believe) as
a
> result
> > of the size of the masks combined with how many threads I want to
run.
> This
> > is only an issue when running CONUS verification due to the amount
of
> masks
> > we have to run (all CWAs, Regions, RFCs, CONUS). It turns out to
be 133
> max
> > masks. It appears that MET takes all of the masks and stores them
in
> memory
> > for the duration of the computation, is that correct? If so:
> >
> >
> > Example CONUS NDFD verification run:
> >
> > 1 lead time verification = 133 masks at ~35MB each ~ 4.5GB RAM per
lead
> > time
> >
> >
> > The issue is that our machine only has 16GB RAM available, so I
can only
> > run 3 threads at a time with this current configuration. With
> verification
> > run-time at approximately 7 minutes per lead time, real-time
verification
> > processing time is going to soar if I can't run more threads at a
time.
> >
> > We have 16 core available and I'm trying to use everything I can
to speed
> > this up in preparations for expanding this to other weather
elements in
> the
> > future when we centralize and modernize MDL's verification
efforts.
> >
> > Is there anything we can do to optimize? Sharing mask RAM across
threads?
> > Compress the masks? We could also run threads with less masks, but
that
> > doesn't seem like it would help too much in terms of total time.
> >
> > Any thoughts would be super helpful!
> >
> > Thank you,
> > Dana
> >
> > --
> > Dana Strom
> >
> > Technical Lead, AceInfo Solutions
> >
> > Meteorological Development Laboratory (NWS)
> >
> > Phone: 301-427-9451
> >
> >
>
>
--
Dana Strom
Technical Lead, AceInfo Solutions
Meteorological Development Laboratory (NWS)
Phone: 301-427-9451
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: John Halley Gotway
Time: Thu Jun 14 13:34:27 2018
Agreed... I do think option 2 would be a lot easier to implement,
require
no changes to the config file logic, and require no updates to the
documentation or online tutorial. It should give us a good bang for
the
buck.
I'll touch base with Tara about this.
Thanks,
John
On Thu, Jun 14, 2018 at 1:31 PM dana.strom at noaa.gov via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
>
> John,
>
> Those sound like two great options! It seems, based on your answers,
the
> one that would take less time / development on your end is option 2.
I'm
> fine with having the same setup I have with non-float values for
each grid
> point, and it seems like it won't be too hard of a change on your
end to
> look for the true/false vs float. Am I correct? That being said,
whichever
> solution you choose will suffice!
>
> Let me know which one you run with and I can adapt to whatever
solution you
> need me to.
>
> Thank you as always for your quick response and your help!
>
> Dana
>
> On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
> > Dana,
> >
> > Great question. I don't have an obvious solution for you right
now but I
> > do have two ideas for how we might enhance MET to better
facilitate this.
> >
> > Here's the first one...
> >
> > Essentially, you're applying 3 different types of masks: CWA, RFC,
and
> > CONUS. For each of these types, each grid point can be assigned a
single
> > mask value:
> > - CONUS: 0 or 1
> > - RFC: 0 through 12
> > - CWA: 0 through 122 (or so)
> >
> > For each of these 3, we could run gen_vx_mask iteratively to
assign an
> > integer to each grid point to define where it belongs. That'd
give us 3
> > gridded data files to define the masks.
> >
> > In the MET config files, right now we can apply "data masking",
which
> > means... read a data field and apply a threshold. We could pass
in
> these 3
> > files and apply many thresholds like "==1" and "==2", and so on.
> >
> > As of met-7.0, this would still result in 133 gridded masks being
defined
> > and the same memory usage. The suggestion is that we define a
more
> concise
> > way of doing this. Ideally, the MET tools would read these 3
fields and
> > define verification masks for each unique value >0 that it finds
in the
> > data.
> >
> > This would definitely require a change to the code but may be
possible.
> > One detail that's important is figuring out what name to assign to
each
> > mask value to populate the VX_MASK column in the output.
> >
> > Here's the second one...
> >
> > We've been a bit lazy in the code, defining the masking regions
using the
> > DataPlane class in MET, which stores double precision values.
Really all
> > we need to store is true/false. So we could switch the data type
for
> > storing masks to consume much, much less memory in storing the 133
> masks...
> > and then keep the exact same setup you're already using.
> >
> > What do you think of these options?
> >
> > Thanks,
> > John
> >
> > On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov via RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > Thu Jun 14 11:08:51 2018: Request 85541 was acted upon.
> > > Transaction: Ticket created by dana.strom at noaa.gov
> > > Queue: met_help
> > > Subject: Thoughts on memory usage while automating
verification
> > > Owner: Nobody
> > > Requestors: dana.strom at noaa.gov
> > > Status: new
> > > Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541
> >
> > >
> > >
> > > Good afternoon,
> > >
> > > My team and I are trying to figure out a way to optimize our
real-time
> > > verification processing. We have run into an issue (I believe)
as a
> > result
> > > of the size of the masks combined with how many threads I want
to run.
> > This
> > > is only an issue when running CONUS verification due to the
amount of
> > masks
> > > we have to run (all CWAs, Regions, RFCs, CONUS). It turns out to
be 133
> > max
> > > masks. It appears that MET takes all of the masks and stores
them in
> > memory
> > > for the duration of the computation, is that correct? If so:
> > >
> > >
> > > Example CONUS NDFD verification run:
> > >
> > > 1 lead time verification = 133 masks at ~35MB each ~ 4.5GB RAM
per lead
> > > time
> > >
> > >
> > > The issue is that our machine only has 16GB RAM available, so I
can
> only
> > > run 3 threads at a time with this current configuration. With
> > verification
> > > run-time at approximately 7 minutes per lead time, real-time
> verification
> > > processing time is going to soar if I can't run more threads at
a time.
> > >
> > > We have 16 core available and I'm trying to use everything I can
to
> speed
> > > this up in preparations for expanding this to other weather
elements in
> > the
> > > future when we centralize and modernize MDL's verification
efforts.
> > >
> > > Is there anything we can do to optimize? Sharing mask RAM across
> threads?
> > > Compress the masks? We could also run threads with less masks,
but that
> > > doesn't seem like it would help too much in terms of total time.
> > >
> > > Any thoughts would be super helpful!
> > >
> > > Thank you,
> > > Dana
> > >
> > > --
> > > Dana Strom
> > >
> > > Technical Lead, AceInfo Solutions
> > >
> > > Meteorological Development Laboratory (NWS)
> > >
> > > Phone: 301-427-9451
> > >
> > >
> >
> >
>
>
> --
> Dana Strom
>
> Technical Lead, AceInfo Solutions
>
> Meteorological Development Laboratory (NWS)
>
> Phone: 301-427-9451
>
>
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: John Halley Gotway
Time: Thu Jun 14 14:23:05 2018
Dana,
I've attached the issue I wrote up in JiRA (our software issue
tracking
system) describing this functionality.
While you wait for this change, would it be feasible to reduce the
number
of masking regions to a much smaller set?
For those on this ticket at NCAR, here's the link:
https://sdg.rap.ucar.edu/jira/browse/MET-1011
Thanks,
John
On Thu, Jun 14, 2018 at 1:34 PM John Halley Gotway <johnhg at ucar.edu>
wrote:
> Agreed... I do think option 2 would be a lot easier to implement,
require
> no changes to the config file logic, and require no updates to the
> documentation or online tutorial. It should give us a good bang for
the
> buck.
>
> I'll touch base with Tara about this.
>
> Thanks,
> John
>
> On Thu, Jun 14, 2018 at 1:31 PM dana.strom at noaa.gov via RT <
> met_help at ucar.edu> wrote:
>
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
>>
>> John,
>>
>> Those sound like two great options! It seems, based on your
answers, the
>> one that would take less time / development on your end is option
2. I'm
>> fine with having the same setup I have with non-float values for
each grid
>> point, and it seems like it won't be too hard of a change on your
end to
>> look for the true/false vs float. Am I correct? That being said,
whichever
>> solution you choose will suffice!
>>
>> Let me know which one you run with and I can adapt to whatever
solution
>> you
>> need me to.
>>
>> Thank you as always for your quick response and your help!
>>
>> Dana
>>
>> On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway via RT <
>> met_help at ucar.edu> wrote:
>>
>> > Dana,
>> >
>> > Great question. I don't have an obvious solution for you right
now but
>> I
>> > do have two ideas for how we might enhance MET to better
facilitate
>> this.
>> >
>> > Here's the first one...
>> >
>> > Essentially, you're applying 3 different types of masks: CWA,
RFC, and
>> > CONUS. For each of these types, each grid point can be assigned
a
>> single
>> > mask value:
>> > - CONUS: 0 or 1
>> > - RFC: 0 through 12
>> > - CWA: 0 through 122 (or so)
>> >
>> > For each of these 3, we could run gen_vx_mask iteratively to
assign an
>> > integer to each grid point to define where it belongs. That'd
give us 3
>> > gridded data files to define the masks.
>> >
>> > In the MET config files, right now we can apply "data masking",
which
>> > means... read a data field and apply a threshold. We could pass
in
>> these 3
>> > files and apply many thresholds like "==1" and "==2", and so on.
>> >
>> > As of met-7.0, this would still result in 133 gridded masks being
>> defined
>> > and the same memory usage. The suggestion is that we define a
more
>> concise
>> > way of doing this. Ideally, the MET tools would read these 3
fields and
>> > define verification masks for each unique value >0 that it finds
in the
>> > data.
>> >
>> > This would definitely require a change to the code but may be
possible.
>> > One detail that's important is figuring out what name to assign
to each
>> > mask value to populate the VX_MASK column in the output.
>> >
>> > Here's the second one...
>> >
>> > We've been a bit lazy in the code, defining the masking regions
using
>> the
>> > DataPlane class in MET, which stores double precision values.
Really
>> all
>> > we need to store is true/false. So we could switch the data type
for
>> > storing masks to consume much, much less memory in storing the
133
>> masks...
>> > and then keep the exact same setup you're already using.
>> >
>> > What do you think of these options?
>> >
>> > Thanks,
>> > John
>> >
>> > On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov via RT <
>> > met_help at ucar.edu> wrote:
>> >
>> > >
>> > > Thu Jun 14 11:08:51 2018: Request 85541 was acted upon.
>> > > Transaction: Ticket created by dana.strom at noaa.gov
>> > > Queue: met_help
>> > > Subject: Thoughts on memory usage while automating
verification
>> > > Owner: Nobody
>> > > Requestors: dana.strom at noaa.gov
>> > > Status: new
>> > > Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541
>> >
>> > >
>> > >
>> > > Good afternoon,
>> > >
>> > > My team and I are trying to figure out a way to optimize our
real-time
>> > > verification processing. We have run into an issue (I believe)
as a
>> > result
>> > > of the size of the masks combined with how many threads I want
to run.
>> > This
>> > > is only an issue when running CONUS verification due to the
amount of
>> > masks
>> > > we have to run (all CWAs, Regions, RFCs, CONUS). It turns out
to be
>> 133
>> > max
>> > > masks. It appears that MET takes all of the masks and stores
them in
>> > memory
>> > > for the duration of the computation, is that correct? If so:
>> > >
>> > >
>> > > Example CONUS NDFD verification run:
>> > >
>> > > 1 lead time verification = 133 masks at ~35MB each ~ 4.5GB RAM
per
>> lead
>> > > time
>> > >
>> > >
>> > > The issue is that our machine only has 16GB RAM available, so I
can
>> only
>> > > run 3 threads at a time with this current configuration. With
>> > verification
>> > > run-time at approximately 7 minutes per lead time, real-time
>> verification
>> > > processing time is going to soar if I can't run more threads at
a
>> time.
>> > >
>> > > We have 16 core available and I'm trying to use everything I
can to
>> speed
>> > > this up in preparations for expanding this to other weather
elements
>> in
>> > the
>> > > future when we centralize and modernize MDL's verification
efforts.
>> > >
>> > > Is there anything we can do to optimize? Sharing mask RAM
across
>> threads?
>> > > Compress the masks? We could also run threads with less masks,
but
>> that
>> > > doesn't seem like it would help too much in terms of total
time.
>> > >
>> > > Any thoughts would be super helpful!
>> > >
>> > > Thank you,
>> > > Dana
>> > >
>> > > --
>> > > Dana Strom
>> > >
>> > > Technical Lead, AceInfo Solutions
>> > >
>> > > Meteorological Development Laboratory (NWS)
>> > >
>> > > Phone: 301-427-9451
>> > >
>> > >
>> >
>> >
>>
>>
>> --
>> Dana Strom
>>
>> Technical Lead, AceInfo Solutions
>>
>> Meteorological Development Laboratory (NWS)
>>
>> Phone: 301-427-9451
>>
>>
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: dana.strom at noaa.gov
Time: Fri Jun 15 05:13:51 2018
John,
Awesome, thank you! We can certainly reduce the amount of jobs we run
and
still get things done, it'll just take a little bit longer until the
solution is implemented. We'll continue to generate test data using a
small
subset of data to test our UI.
Thanks again for your help! This is going to do wonders for our
optimization and real-time processing.
Dana
On Thu, Jun 14, 2018 at 4:23 PM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:
> Dana,
>
> I've attached the issue I wrote up in JiRA (our software issue
tracking
> system) describing this functionality.
>
> While you wait for this change, would it be feasible to reduce the
number
> of masking regions to a much smaller set?
>
> For those on this ticket at NCAR, here's the link:
> https://sdg.rap.ucar.edu/jira/browse/MET-1011
>
> Thanks,
> John
>
> On Thu, Jun 14, 2018 at 1:34 PM John Halley Gotway <johnhg at ucar.edu>
> wrote:
>
> > Agreed... I do think option 2 would be a lot easier to implement,
require
> > no changes to the config file logic, and require no updates to the
> > documentation or online tutorial. It should give us a good bang
for the
> > buck.
> >
> > I'll touch base with Tara about this.
> >
> > Thanks,
> > John
> >
> > On Thu, Jun 14, 2018 at 1:31 PM dana.strom at noaa.gov via RT <
> > met_help at ucar.edu> wrote:
> >
> >>
> >> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> >>
> >> John,
> >>
> >> Those sound like two great options! It seems, based on your
answers, the
> >> one that would take less time / development on your end is option
2. I'm
> >> fine with having the same setup I have with non-float values for
each
> grid
> >> point, and it seems like it won't be too hard of a change on your
end to
> >> look for the true/false vs float. Am I correct? That being said,
> whichever
> >> solution you choose will suffice!
> >>
> >> Let me know which one you run with and I can adapt to whatever
solution
> >> you
> >> need me to.
> >>
> >> Thank you as always for your quick response and your help!
> >>
> >> Dana
> >>
> >> On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway via RT <
> >> met_help at ucar.edu> wrote:
> >>
> >> > Dana,
> >> >
> >> > Great question. I don't have an obvious solution for you right
now
> but
> >> I
> >> > do have two ideas for how we might enhance MET to better
facilitate
> >> this.
> >> >
> >> > Here's the first one...
> >> >
> >> > Essentially, you're applying 3 different types of masks: CWA,
RFC, and
> >> > CONUS. For each of these types, each grid point can be
assigned a
> >> single
> >> > mask value:
> >> > - CONUS: 0 or 1
> >> > - RFC: 0 through 12
> >> > - CWA: 0 through 122 (or so)
> >> >
> >> > For each of these 3, we could run gen_vx_mask iteratively to
assign an
> >> > integer to each grid point to define where it belongs. That'd
give
> us 3
> >> > gridded data files to define the masks.
> >> >
> >> > In the MET config files, right now we can apply "data masking",
which
> >> > means... read a data field and apply a threshold. We could
pass in
> >> these 3
> >> > files and apply many thresholds like "==1" and "==2", and so
on.
> >> >
> >> > As of met-7.0, this would still result in 133 gridded masks
being
> >> defined
> >> > and the same memory usage. The suggestion is that we define a
more
> >> concise
> >> > way of doing this. Ideally, the MET tools would read these 3
fields
> and
> >> > define verification masks for each unique value >0 that it
finds in
> the
> >> > data.
> >> >
> >> > This would definitely require a change to the code but may be
> possible.
> >> > One detail that's important is figuring out what name to assign
to
> each
> >> > mask value to populate the VX_MASK column in the output.
> >> >
> >> > Here's the second one...
> >> >
> >> > We've been a bit lazy in the code, defining the masking regions
using
> >> the
> >> > DataPlane class in MET, which stores double precision values.
Really
> >> all
> >> > we need to store is true/false. So we could switch the data
type for
> >> > storing masks to consume much, much less memory in storing the
133
> >> masks...
> >> > and then keep the exact same setup you're already using.
> >> >
> >> > What do you think of these options?
> >> >
> >> > Thanks,
> >> > John
> >> >
> >> > On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov via RT <
> >> > met_help at ucar.edu> wrote:
> >> >
> >> > >
> >> > > Thu Jun 14 11:08:51 2018: Request 85541 was acted upon.
> >> > > Transaction: Ticket created by dana.strom at noaa.gov
> >> > > Queue: met_help
> >> > > Subject: Thoughts on memory usage while automating
verification
> >> > > Owner: Nobody
> >> > > Requestors: dana.strom at noaa.gov
> >> > > Status: new
> >> > > Ticket <URL: https://rt.rap.ucar.edu/rt/
> Ticket/Display.html?id=85541
> >> >
> >> > >
> >> > >
> >> > > Good afternoon,
> >> > >
> >> > > My team and I are trying to figure out a way to optimize our
> real-time
> >> > > verification processing. We have run into an issue (I
believe) as a
> >> > result
> >> > > of the size of the masks combined with how many threads I
want to
> run.
> >> > This
> >> > > is only an issue when running CONUS verification due to the
amount
> of
> >> > masks
> >> > > we have to run (all CWAs, Regions, RFCs, CONUS). It turns out
to be
> >> 133
> >> > max
> >> > > masks. It appears that MET takes all of the masks and stores
them in
> >> > memory
> >> > > for the duration of the computation, is that correct? If so:
> >> > >
> >> > >
> >> > > Example CONUS NDFD verification run:
> >> > >
> >> > > 1 lead time verification = 133 masks at ~35MB each ~ 4.5GB
RAM per
> >> lead
> >> > > time
> >> > >
> >> > >
> >> > > The issue is that our machine only has 16GB RAM available, so
I can
> >> only
> >> > > run 3 threads at a time with this current configuration. With
> >> > verification
> >> > > run-time at approximately 7 minutes per lead time, real-time
> >> verification
> >> > > processing time is going to soar if I can't run more threads
at a
> >> time.
> >> > >
> >> > > We have 16 core available and I'm trying to use everything I
can to
> >> speed
> >> > > this up in preparations for expanding this to other weather
elements
> >> in
> >> > the
> >> > > future when we centralize and modernize MDL's verification
efforts.
> >> > >
> >> > > Is there anything we can do to optimize? Sharing mask RAM
across
> >> threads?
> >> > > Compress the masks? We could also run threads with less
masks, but
> >> that
> >> > > doesn't seem like it would help too much in terms of total
time.
> >> > >
> >> > > Any thoughts would be super helpful!
> >> > >
> >> > > Thank you,
> >> > > Dana
> >> > >
> >> > > --
> >> > > Dana Strom
> >> > >
> >> > > Technical Lead, AceInfo Solutions
> >> > >
> >> > > Meteorological Development Laboratory (NWS)
> >> > >
> >> > > Phone: 301-427-9451
> >> > >
> >> > >
> >> >
> >> >
> >>
> >>
> >> --
> >> Dana Strom
> >>
> >> Technical Lead, AceInfo Solutions
> >>
> >> Meteorological Development Laboratory (NWS)
> >>
> >> Phone: 301-427-9451
> >>
> >>
>
>
--
Dana Strom
Technical Lead, AceInfo Solutions
Meteorological Development Laboratory (NWS)
Phone: 301-427-9451
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: John Halley Gotway
Time: Thu Jul 05 10:27:41 2018
Hi Dana,
Wanted to give you an update on this issue. In the development
version of
the MET code, I modified how the masking regions are stored... using
booleans instead of double-precision. Then I ran a test of Grid-Stat
to
quantify the impact.
For the HRRR domain (1799x1059), I ran Grid-Stat to compare one field
to
itself. But I computed the stats over 123 County Warning Areas that I
generated used gen_vx_mask.
Here's how the runs compared on my machine:
met-7.0 took 01:12 to run and consumed about 2 GB of memory.
met-7.1 took 00:22 to run and consumed about 0.47 GB of memory.
Definitely headed in the right direction! These changes will be
included
in the next beta release for met-7.1 for you to try out.
Thanks,
John
On Fri, Jun 15, 2018 at 5:14 AM dana.strom at noaa.gov via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
>
> John,
>
> Awesome, thank you! We can certainly reduce the amount of jobs we
run and
> still get things done, it'll just take a little bit longer until the
> solution is implemented. We'll continue to generate test data using
a small
> subset of data to test our UI.
>
> Thanks again for your help! This is going to do wonders for our
> optimization and real-time processing.
>
> Dana
>
> On Thu, Jun 14, 2018 at 4:23 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
> > Dana,
> >
> > I've attached the issue I wrote up in JiRA (our software issue
tracking
> > system) describing this functionality.
> >
> > While you wait for this change, would it be feasible to reduce the
number
> > of masking regions to a much smaller set?
> >
> > For those on this ticket at NCAR, here's the link:
> > https://sdg.rap.ucar.edu/jira/browse/MET-1011
> >
> > Thanks,
> > John
> >
> > On Thu, Jun 14, 2018 at 1:34 PM John Halley Gotway
<johnhg at ucar.edu>
> > wrote:
> >
> > > Agreed... I do think option 2 would be a lot easier to
implement,
> require
> > > no changes to the config file logic, and require no updates to
the
> > > documentation or online tutorial. It should give us a good bang
for
> the
> > > buck.
> > >
> > > I'll touch base with Tara about this.
> > >
> > > Thanks,
> > > John
> > >
> > > On Thu, Jun 14, 2018 at 1:31 PM dana.strom at noaa.gov via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > >>
> > >> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> > >>
> > >> John,
> > >>
> > >> Those sound like two great options! It seems, based on your
answers,
> the
> > >> one that would take less time / development on your end is
option 2.
> I'm
> > >> fine with having the same setup I have with non-float values
for each
> > grid
> > >> point, and it seems like it won't be too hard of a change on
your end
> to
> > >> look for the true/false vs float. Am I correct? That being
said,
> > whichever
> > >> solution you choose will suffice!
> > >>
> > >> Let me know which one you run with and I can adapt to whatever
> solution
> > >> you
> > >> need me to.
> > >>
> > >> Thank you as always for your quick response and your help!
> > >>
> > >> Dana
> > >>
> > >> On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway via RT <
> > >> met_help at ucar.edu> wrote:
> > >>
> > >> > Dana,
> > >> >
> > >> > Great question. I don't have an obvious solution for you
right now
> > but
> > >> I
> > >> > do have two ideas for how we might enhance MET to better
facilitate
> > >> this.
> > >> >
> > >> > Here's the first one...
> > >> >
> > >> > Essentially, you're applying 3 different types of masks: CWA,
RFC,
> and
> > >> > CONUS. For each of these types, each grid point can be
assigned a
> > >> single
> > >> > mask value:
> > >> > - CONUS: 0 or 1
> > >> > - RFC: 0 through 12
> > >> > - CWA: 0 through 122 (or so)
> > >> >
> > >> > For each of these 3, we could run gen_vx_mask iteratively to
assign
> an
> > >> > integer to each grid point to define where it belongs.
That'd give
> > us 3
> > >> > gridded data files to define the masks.
> > >> >
> > >> > In the MET config files, right now we can apply "data
masking",
> which
> > >> > means... read a data field and apply a threshold. We could
pass in
> > >> these 3
> > >> > files and apply many thresholds like "==1" and "==2", and so
on.
> > >> >
> > >> > As of met-7.0, this would still result in 133 gridded masks
being
> > >> defined
> > >> > and the same memory usage. The suggestion is that we define
a more
> > >> concise
> > >> > way of doing this. Ideally, the MET tools would read these 3
fields
> > and
> > >> > define verification masks for each unique value >0 that it
finds in
> > the
> > >> > data.
> > >> >
> > >> > This would definitely require a change to the code but may be
> > possible.
> > >> > One detail that's important is figuring out what name to
assign to
> > each
> > >> > mask value to populate the VX_MASK column in the output.
> > >> >
> > >> > Here's the second one...
> > >> >
> > >> > We've been a bit lazy in the code, defining the masking
regions
> using
> > >> the
> > >> > DataPlane class in MET, which stores double precision values.
> Really
> > >> all
> > >> > we need to store is true/false. So we could switch the data
type
> for
> > >> > storing masks to consume much, much less memory in storing
the 133
> > >> masks...
> > >> > and then keep the exact same setup you're already using.
> > >> >
> > >> > What do you think of these options?
> > >> >
> > >> > Thanks,
> > >> > John
> > >> >
> > >> > On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov via RT <
> > >> > met_help at ucar.edu> wrote:
> > >> >
> > >> > >
> > >> > > Thu Jun 14 11:08:51 2018: Request 85541 was acted upon.
> > >> > > Transaction: Ticket created by dana.strom at noaa.gov
> > >> > > Queue: met_help
> > >> > > Subject: Thoughts on memory usage while automating
> verification
> > >> > > Owner: Nobody
> > >> > > Requestors: dana.strom at noaa.gov
> > >> > > Status: new
> > >> > > Ticket <URL: https://rt.rap.ucar.edu/rt/
> > Ticket/Display.html?id=85541
> > >> >
> > >> > >
> > >> > >
> > >> > > Good afternoon,
> > >> > >
> > >> > > My team and I are trying to figure out a way to optimize
our
> > real-time
> > >> > > verification processing. We have run into an issue (I
believe) as
> a
> > >> > result
> > >> > > of the size of the masks combined with how many threads I
want to
> > run.
> > >> > This
> > >> > > is only an issue when running CONUS verification due to the
amount
> > of
> > >> > masks
> > >> > > we have to run (all CWAs, Regions, RFCs, CONUS). It turns
out to
> be
> > >> 133
> > >> > max
> > >> > > masks. It appears that MET takes all of the masks and
stores them
> in
> > >> > memory
> > >> > > for the duration of the computation, is that correct? If
so:
> > >> > >
> > >> > >
> > >> > > Example CONUS NDFD verification run:
> > >> > >
> > >> > > 1 lead time verification = 133 masks at ~35MB each ~ 4.5GB
RAM per
> > >> lead
> > >> > > time
> > >> > >
> > >> > >
> > >> > > The issue is that our machine only has 16GB RAM available,
so I
> can
> > >> only
> > >> > > run 3 threads at a time with this current configuration.
With
> > >> > verification
> > >> > > run-time at approximately 7 minutes per lead time, real-
time
> > >> verification
> > >> > > processing time is going to soar if I can't run more
threads at a
> > >> time.
> > >> > >
> > >> > > We have 16 core available and I'm trying to use everything
I can
> to
> > >> speed
> > >> > > this up in preparations for expanding this to other weather
> elements
> > >> in
> > >> > the
> > >> > > future when we centralize and modernize MDL's verification
> efforts.
> > >> > >
> > >> > > Is there anything we can do to optimize? Sharing mask RAM
across
> > >> threads?
> > >> > > Compress the masks? We could also run threads with less
masks, but
> > >> that
> > >> > > doesn't seem like it would help too much in terms of total
time.
> > >> > >
> > >> > > Any thoughts would be super helpful!
> > >> > >
> > >> > > Thank you,
> > >> > > Dana
> > >> > >
> > >> > > --
> > >> > > Dana Strom
> > >> > >
> > >> > > Technical Lead, AceInfo Solutions
> > >> > >
> > >> > > Meteorological Development Laboratory (NWS)
> > >> > >
> > >> > > Phone: 301-427-9451
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >>
> > >>
> > >> --
> > >> Dana Strom
> > >>
> > >> Technical Lead, AceInfo Solutions
> > >>
> > >> Meteorological Development Laboratory (NWS)
> > >>
> > >> Phone: 301-427-9451
> > >>
> > >>
> >
> >
>
>
> --
> Dana Strom
>
> Technical Lead, AceInfo Solutions
>
> Meteorological Development Laboratory (NWS)
>
> Phone: 301-427-9451
>
>
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: dana.strom at noaa.gov
Time: Thu Jul 05 10:42:35 2018
Hi John,
Awesome news! That looks like it's going to help out greatly for
memory
usage!
Do you attribtute the speed increase to this as well? That's a
tremendous
uptick in efficiency (especially if it's solely attributed to the I/O
of
the smaller masks). I would be stoked if the grid_stat calculation
time
decreased by 70%.
When do you think you'll get the next beta out? I'd love to give it a
spin.
Dana
On Thu, Jul 5, 2018 at 12:27 PM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:
> Hi Dana,
>
> Wanted to give you an update on this issue. In the development
version of
> the MET code, I modified how the masking regions are stored... using
> booleans instead of double-precision. Then I ran a test of Grid-
Stat to
> quantify the impact.
>
> For the HRRR domain (1799x1059), I ran Grid-Stat to compare one
field to
> itself. But I computed the stats over 123 County Warning Areas that
I
> generated used gen_vx_mask.
>
> Here's how the runs compared on my machine:
> met-7.0 took 01:12 to run and consumed about 2 GB of memory.
> met-7.1 took 00:22 to run and consumed about 0.47 GB of memory.
>
> Definitely headed in the right direction! These changes will be
included
> in the next beta release for met-7.1 for you to try out.
>
> Thanks,
> John
>
> On Fri, Jun 15, 2018 at 5:14 AM dana.strom at noaa.gov via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> >
> > John,
> >
> > Awesome, thank you! We can certainly reduce the amount of jobs we
run and
> > still get things done, it'll just take a little bit longer until
the
> > solution is implemented. We'll continue to generate test data
using a
> small
> > subset of data to test our UI.
> >
> > Thanks again for your help! This is going to do wonders for our
> > optimization and real-time processing.
> >
> > Dana
> >
> > On Thu, Jun 14, 2018 at 4:23 PM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> >
> > > Dana,
> > >
> > > I've attached the issue I wrote up in JiRA (our software issue
tracking
> > > system) describing this functionality.
> > >
> > > While you wait for this change, would it be feasible to reduce
the
> number
> > > of masking regions to a much smaller set?
> > >
> > > For those on this ticket at NCAR, here's the link:
> > > https://sdg.rap.ucar.edu/jira/browse/MET-1011
> > >
> > > Thanks,
> > > John
> > >
> > > On Thu, Jun 14, 2018 at 1:34 PM John Halley Gotway
<johnhg at ucar.edu>
> > > wrote:
> > >
> > > > Agreed... I do think option 2 would be a lot easier to
implement,
> > require
> > > > no changes to the config file logic, and require no updates to
the
> > > > documentation or online tutorial. It should give us a good
bang for
> > the
> > > > buck.
> > > >
> > > > I'll touch base with Tara about this.
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > > On Thu, Jun 14, 2018 at 1:31 PM dana.strom at noaa.gov via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > >>
> > > >> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541
>
> > > >>
> > > >> John,
> > > >>
> > > >> Those sound like two great options! It seems, based on your
answers,
> > the
> > > >> one that would take less time / development on your end is
option 2.
> > I'm
> > > >> fine with having the same setup I have with non-float values
for
> each
> > > grid
> > > >> point, and it seems like it won't be too hard of a change on
your
> end
> > to
> > > >> look for the true/false vs float. Am I correct? That being
said,
> > > whichever
> > > >> solution you choose will suffice!
> > > >>
> > > >> Let me know which one you run with and I can adapt to
whatever
> > solution
> > > >> you
> > > >> need me to.
> > > >>
> > > >> Thank you as always for your quick response and your help!
> > > >>
> > > >> Dana
> > > >>
> > > >> On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway via RT <
> > > >> met_help at ucar.edu> wrote:
> > > >>
> > > >> > Dana,
> > > >> >
> > > >> > Great question. I don't have an obvious solution for you
right
> now
> > > but
> > > >> I
> > > >> > do have two ideas for how we might enhance MET to better
> facilitate
> > > >> this.
> > > >> >
> > > >> > Here's the first one...
> > > >> >
> > > >> > Essentially, you're applying 3 different types of masks:
CWA, RFC,
> > and
> > > >> > CONUS. For each of these types, each grid point can be
assigned a
> > > >> single
> > > >> > mask value:
> > > >> > - CONUS: 0 or 1
> > > >> > - RFC: 0 through 12
> > > >> > - CWA: 0 through 122 (or so)
> > > >> >
> > > >> > For each of these 3, we could run gen_vx_mask iteratively
to
> assign
> > an
> > > >> > integer to each grid point to define where it belongs.
That'd
> give
> > > us 3
> > > >> > gridded data files to define the masks.
> > > >> >
> > > >> > In the MET config files, right now we can apply "data
masking",
> > which
> > > >> > means... read a data field and apply a threshold. We could
pass
> in
> > > >> these 3
> > > >> > files and apply many thresholds like "==1" and "==2", and
so on.
> > > >> >
> > > >> > As of met-7.0, this would still result in 133 gridded masks
being
> > > >> defined
> > > >> > and the same memory usage. The suggestion is that we
define a
> more
> > > >> concise
> > > >> > way of doing this. Ideally, the MET tools would read these
3
> fields
> > > and
> > > >> > define verification masks for each unique value >0 that it
finds
> in
> > > the
> > > >> > data.
> > > >> >
> > > >> > This would definitely require a change to the code but may
be
> > > possible.
> > > >> > One detail that's important is figuring out what name to
assign to
> > > each
> > > >> > mask value to populate the VX_MASK column in the output.
> > > >> >
> > > >> > Here's the second one...
> > > >> >
> > > >> > We've been a bit lazy in the code, defining the masking
regions
> > using
> > > >> the
> > > >> > DataPlane class in MET, which stores double precision
values.
> > Really
> > > >> all
> > > >> > we need to store is true/false. So we could switch the
data type
> > for
> > > >> > storing masks to consume much, much less memory in storing
the 133
> > > >> masks...
> > > >> > and then keep the exact same setup you're already using.
> > > >> >
> > > >> > What do you think of these options?
> > > >> >
> > > >> > Thanks,
> > > >> > John
> > > >> >
> > > >> > On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov via RT
<
> > > >> > met_help at ucar.edu> wrote:
> > > >> >
> > > >> > >
> > > >> > > Thu Jun 14 11:08:51 2018: Request 85541 was acted upon.
> > > >> > > Transaction: Ticket created by dana.strom at noaa.gov
> > > >> > > Queue: met_help
> > > >> > > Subject: Thoughts on memory usage while automating
> > verification
> > > >> > > Owner: Nobody
> > > >> > > Requestors: dana.strom at noaa.gov
> > > >> > > Status: new
> > > >> > > Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > Ticket/Display.html?id=85541
> > > >> >
> > > >> > >
> > > >> > >
> > > >> > > Good afternoon,
> > > >> > >
> > > >> > > My team and I are trying to figure out a way to optimize
our
> > > real-time
> > > >> > > verification processing. We have run into an issue (I
believe)
> as
> > a
> > > >> > result
> > > >> > > of the size of the masks combined with how many threads I
want
> to
> > > run.
> > > >> > This
> > > >> > > is only an issue when running CONUS verification due to
the
> amount
> > > of
> > > >> > masks
> > > >> > > we have to run (all CWAs, Regions, RFCs, CONUS). It turns
out to
> > be
> > > >> 133
> > > >> > max
> > > >> > > masks. It appears that MET takes all of the masks and
stores
> them
> > in
> > > >> > memory
> > > >> > > for the duration of the computation, is that correct? If
so:
> > > >> > >
> > > >> > >
> > > >> > > Example CONUS NDFD verification run:
> > > >> > >
> > > >> > > 1 lead time verification = 133 masks at ~35MB each ~
4.5GB RAM
> per
> > > >> lead
> > > >> > > time
> > > >> > >
> > > >> > >
> > > >> > > The issue is that our machine only has 16GB RAM
available, so I
> > can
> > > >> only
> > > >> > > run 3 threads at a time with this current configuration.
With
> > > >> > verification
> > > >> > > run-time at approximately 7 minutes per lead time, real-
time
> > > >> verification
> > > >> > > processing time is going to soar if I can't run more
threads at
> a
> > > >> time.
> > > >> > >
> > > >> > > We have 16 core available and I'm trying to use
everything I can
> > to
> > > >> speed
> > > >> > > this up in preparations for expanding this to other
weather
> > elements
> > > >> in
> > > >> > the
> > > >> > > future when we centralize and modernize MDL's
verification
> > efforts.
> > > >> > >
> > > >> > > Is there anything we can do to optimize? Sharing mask RAM
across
> > > >> threads?
> > > >> > > Compress the masks? We could also run threads with less
masks,
> but
> > > >> that
> > > >> > > doesn't seem like it would help too much in terms of
total time.
> > > >> > >
> > > >> > > Any thoughts would be super helpful!
> > > >> > >
> > > >> > > Thank you,
> > > >> > > Dana
> > > >> > >
> > > >> > > --
> > > >> > > Dana Strom
> > > >> > >
> > > >> > > Technical Lead, AceInfo Solutions
> > > >> > >
> > > >> > > Meteorological Development Laboratory (NWS)
> > > >> > >
> > > >> > > Phone: 301-427-9451
> > > >> > >
> > > >> > >
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> Dana Strom
> > > >>
> > > >> Technical Lead, AceInfo Solutions
> > > >>
> > > >> Meteorological Development Laboratory (NWS)
> > > >>
> > > >> Phone: 301-427-9451
> > > >>
> > > >>
> > >
> > >
> >
> >
> > --
> > Dana Strom
> >
> > Technical Lead, AceInfo Solutions
> >
> > Meteorological Development Laboratory (NWS)
> >
> > Phone: 301-427-9451
> >
> >
>
>
--
Dana Strom
Technical Lead, AceInfo Solutions
Meteorological Development Laboratory (NWS)
Phone: 301-427-9451
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: John Halley Gotway
Time: Thu Jul 05 11:20:20 2018
Dana,
I don't have a definitive explanation for the increase in speed, but
an
educated guess is that its due to memory allocation. It takes a lot
less
time to allocate 2 million 1-byte booleans than 2 million 8-byte
doubles
(and the HRRR contains about 2 million points).
One additional detail to mention. Earlier versions of Grid-Stat were
slowed down by some poor memory allocation logic. The code allocated
memory "on-demand"... meaning that as arrays grew, they automatically
resized themselves. That was convenient in coding but led to a huge
hit in
efficiency. We were allocating and dellocating memory
over-and-over-and-over again. Cleaning that up and allocating the
memory
all at once up-front made the code much faster.
... But in my testing this morning, I found/fixed one more instance of
this
issue. It was suspicious that adding in the "FULL" model domain made
Grid-Stat run almost twice as slow (22 seconds without and 44 seconds
with
it). After fixing that issues, it takes only 22.3 seconds.
For comparison, I ran met-7.0 and met-7.1 with 123 CWA's + FULL
domain:
met-7.0 took 1:31 and met-7.1 took 22.3 seconds.
I need to test a few more changes first but hope to get the next beta
release out tomorrow or Monday.
Does that work for you?
Thanks,
John
On Thu, Jul 5, 2018 at 10:43 AM dana.strom at noaa.gov via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
>
> Hi John,
>
> Awesome news! That looks like it's going to help out greatly for
memory
> usage!
>
> Do you attribtute the speed increase to this as well? That's a
tremendous
> uptick in efficiency (especially if it's solely attributed to the
I/O of
> the smaller masks). I would be stoked if the grid_stat calculation
time
> decreased by 70%.
>
> When do you think you'll get the next beta out? I'd love to give it
a spin.
>
> Dana
>
> On Thu, Jul 5, 2018 at 12:27 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
> > Hi Dana,
> >
> > Wanted to give you an update on this issue. In the development
version
> of
> > the MET code, I modified how the masking regions are stored...
using
> > booleans instead of double-precision. Then I ran a test of Grid-
Stat to
> > quantify the impact.
> >
> > For the HRRR domain (1799x1059), I ran Grid-Stat to compare one
field to
> > itself. But I computed the stats over 123 County Warning Areas
that I
> > generated used gen_vx_mask.
> >
> > Here's how the runs compared on my machine:
> > met-7.0 took 01:12 to run and consumed about 2 GB of memory.
> > met-7.1 took 00:22 to run and consumed about 0.47 GB of memory.
> >
> > Definitely headed in the right direction! These changes will be
included
> > in the next beta release for met-7.1 for you to try out.
> >
> > Thanks,
> > John
> >
> > On Fri, Jun 15, 2018 at 5:14 AM dana.strom at noaa.gov via RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> > >
> > > John,
> > >
> > > Awesome, thank you! We can certainly reduce the amount of jobs
we run
> and
> > > still get things done, it'll just take a little bit longer until
the
> > > solution is implemented. We'll continue to generate test data
using a
> > small
> > > subset of data to test our UI.
> > >
> > > Thanks again for your help! This is going to do wonders for our
> > > optimization and real-time processing.
> > >
> > > Dana
> > >
> > > On Thu, Jun 14, 2018 at 4:23 PM, John Halley Gotway via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > > Dana,
> > > >
> > > > I've attached the issue I wrote up in JiRA (our software issue
> tracking
> > > > system) describing this functionality.
> > > >
> > > > While you wait for this change, would it be feasible to reduce
the
> > number
> > > > of masking regions to a much smaller set?
> > > >
> > > > For those on this ticket at NCAR, here's the link:
> > > > https://sdg.rap.ucar.edu/jira/browse/MET-1011
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > > On Thu, Jun 14, 2018 at 1:34 PM John Halley Gotway
<johnhg at ucar.edu>
> > > > wrote:
> > > >
> > > > > Agreed... I do think option 2 would be a lot easier to
implement,
> > > require
> > > > > no changes to the config file logic, and require no updates
to the
> > > > > documentation or online tutorial. It should give us a good
bang
> for
> > > the
> > > > > buck.
> > > > >
> > > > > I'll touch base with Tara about this.
> > > > >
> > > > > Thanks,
> > > > > John
> > > > >
> > > > > On Thu, Jun 14, 2018 at 1:31 PM dana.strom at noaa.gov via RT <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > >>
> > > > >> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> > > > >>
> > > > >> John,
> > > > >>
> > > > >> Those sound like two great options! It seems, based on your
> answers,
> > > the
> > > > >> one that would take less time / development on your end is
option
> 2.
> > > I'm
> > > > >> fine with having the same setup I have with non-float
values for
> > each
> > > > grid
> > > > >> point, and it seems like it won't be too hard of a change
on your
> > end
> > > to
> > > > >> look for the true/false vs float. Am I correct? That being
said,
> > > > whichever
> > > > >> solution you choose will suffice!
> > > > >>
> > > > >> Let me know which one you run with and I can adapt to
whatever
> > > solution
> > > > >> you
> > > > >> need me to.
> > > > >>
> > > > >> Thank you as always for your quick response and your help!
> > > > >>
> > > > >> Dana
> > > > >>
> > > > >> On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway via RT
<
> > > > >> met_help at ucar.edu> wrote:
> > > > >>
> > > > >> > Dana,
> > > > >> >
> > > > >> > Great question. I don't have an obvious solution for you
right
> > now
> > > > but
> > > > >> I
> > > > >> > do have two ideas for how we might enhance MET to better
> > facilitate
> > > > >> this.
> > > > >> >
> > > > >> > Here's the first one...
> > > > >> >
> > > > >> > Essentially, you're applying 3 different types of masks:
CWA,
> RFC,
> > > and
> > > > >> > CONUS. For each of these types, each grid point can be
> assigned a
> > > > >> single
> > > > >> > mask value:
> > > > >> > - CONUS: 0 or 1
> > > > >> > - RFC: 0 through 12
> > > > >> > - CWA: 0 through 122 (or so)
> > > > >> >
> > > > >> > For each of these 3, we could run gen_vx_mask iteratively
to
> > assign
> > > an
> > > > >> > integer to each grid point to define where it belongs.
That'd
> > give
> > > > us 3
> > > > >> > gridded data files to define the masks.
> > > > >> >
> > > > >> > In the MET config files, right now we can apply "data
masking",
> > > which
> > > > >> > means... read a data field and apply a threshold. We
could pass
> > in
> > > > >> these 3
> > > > >> > files and apply many thresholds like "==1" and "==2", and
so on.
> > > > >> >
> > > > >> > As of met-7.0, this would still result in 133 gridded
masks
> being
> > > > >> defined
> > > > >> > and the same memory usage. The suggestion is that we
define a
> > more
> > > > >> concise
> > > > >> > way of doing this. Ideally, the MET tools would read
these 3
> > fields
> > > > and
> > > > >> > define verification masks for each unique value >0 that
it finds
> > in
> > > > the
> > > > >> > data.
> > > > >> >
> > > > >> > This would definitely require a change to the code but
may be
> > > > possible.
> > > > >> > One detail that's important is figuring out what name to
assign
> to
> > > > each
> > > > >> > mask value to populate the VX_MASK column in the output.
> > > > >> >
> > > > >> > Here's the second one...
> > > > >> >
> > > > >> > We've been a bit lazy in the code, defining the masking
regions
> > > using
> > > > >> the
> > > > >> > DataPlane class in MET, which stores double precision
values.
> > > Really
> > > > >> all
> > > > >> > we need to store is true/false. So we could switch the
data
> type
> > > for
> > > > >> > storing masks to consume much, much less memory in
storing the
> 133
> > > > >> masks...
> > > > >> > and then keep the exact same setup you're already using.
> > > > >> >
> > > > >> > What do you think of these options?
> > > > >> >
> > > > >> > Thanks,
> > > > >> > John
> > > > >> >
> > > > >> > On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov via
RT <
> > > > >> > met_help at ucar.edu> wrote:
> > > > >> >
> > > > >> > >
> > > > >> > > Thu Jun 14 11:08:51 2018: Request 85541 was acted upon.
> > > > >> > > Transaction: Ticket created by dana.strom at noaa.gov
> > > > >> > > Queue: met_help
> > > > >> > > Subject: Thoughts on memory usage while automating
> > > verification
> > > > >> > > Owner: Nobody
> > > > >> > > Requestors: dana.strom at noaa.gov
> > > > >> > > Status: new
> > > > >> > > Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > > Ticket/Display.html?id=85541
> > > > >> >
> > > > >> > >
> > > > >> > >
> > > > >> > > Good afternoon,
> > > > >> > >
> > > > >> > > My team and I are trying to figure out a way to
optimize our
> > > > real-time
> > > > >> > > verification processing. We have run into an issue (I
believe)
> > as
> > > a
> > > > >> > result
> > > > >> > > of the size of the masks combined with how many threads
I want
> > to
> > > > run.
> > > > >> > This
> > > > >> > > is only an issue when running CONUS verification due to
the
> > amount
> > > > of
> > > > >> > masks
> > > > >> > > we have to run (all CWAs, Regions, RFCs, CONUS). It
turns out
> to
> > > be
> > > > >> 133
> > > > >> > max
> > > > >> > > masks. It appears that MET takes all of the masks and
stores
> > them
> > > in
> > > > >> > memory
> > > > >> > > for the duration of the computation, is that correct?
If so:
> > > > >> > >
> > > > >> > >
> > > > >> > > Example CONUS NDFD verification run:
> > > > >> > >
> > > > >> > > 1 lead time verification = 133 masks at ~35MB each ~
4.5GB RAM
> > per
> > > > >> lead
> > > > >> > > time
> > > > >> > >
> > > > >> > >
> > > > >> > > The issue is that our machine only has 16GB RAM
available, so
> I
> > > can
> > > > >> only
> > > > >> > > run 3 threads at a time with this current
configuration. With
> > > > >> > verification
> > > > >> > > run-time at approximately 7 minutes per lead time,
real-time
> > > > >> verification
> > > > >> > > processing time is going to soar if I can't run more
threads
> at
> > a
> > > > >> time.
> > > > >> > >
> > > > >> > > We have 16 core available and I'm trying to use
everything I
> can
> > > to
> > > > >> speed
> > > > >> > > this up in preparations for expanding this to other
weather
> > > elements
> > > > >> in
> > > > >> > the
> > > > >> > > future when we centralize and modernize MDL's
verification
> > > efforts.
> > > > >> > >
> > > > >> > > Is there anything we can do to optimize? Sharing mask
RAM
> across
> > > > >> threads?
> > > > >> > > Compress the masks? We could also run threads with less
masks,
> > but
> > > > >> that
> > > > >> > > doesn't seem like it would help too much in terms of
total
> time.
> > > > >> > >
> > > > >> > > Any thoughts would be super helpful!
> > > > >> > >
> > > > >> > > Thank you,
> > > > >> > > Dana
> > > > >> > >
> > > > >> > > --
> > > > >> > > Dana Strom
> > > > >> > >
> > > > >> > > Technical Lead, AceInfo Solutions
> > > > >> > >
> > > > >> > > Meteorological Development Laboratory (NWS)
> > > > >> > >
> > > > >> > > Phone: 301-427-9451
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Dana Strom
> > > > >>
> > > > >> Technical Lead, AceInfo Solutions
> > > > >>
> > > > >> Meteorological Development Laboratory (NWS)
> > > > >>
> > > > >> Phone: 301-427-9451
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> > >
> > > --
> > > Dana Strom
> > >
> > > Technical Lead, AceInfo Solutions
> > >
> > > Meteorological Development Laboratory (NWS)
> > >
> > > Phone: 301-427-9451
> > >
> > >
> >
> >
>
>
> --
> Dana Strom
>
> Technical Lead, AceInfo Solutions
>
> Meteorological Development Laboratory (NWS)
>
> Phone: 301-427-9451
>
>
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: dana.strom at noaa.gov
Time: Thu Jul 05 11:37:02 2018
John,
Memory allocation is always a HUGE pain, great find!
Tomorrow or Monday for the release sounds more than awesome.
My team is utilizing not only WCOSS (where I beta test) but a docker
container on our local system for real-time processing. Would it be
too
much to ask to also get the beta as a container as well? This fix
looks
like it's going to be the solution we need to turn on our real-time
processing for everything (and for me to beat the crap out of our
server :)
).
As always, thank you very much for your efforts!
Dana
On Thu, Jul 5, 2018 at 1:20 PM, John Halley Gotway via RT
<met_help at ucar.edu
> wrote:
> Dana,
>
> I don't have a definitive explanation for the increase in speed, but
an
> educated guess is that its due to memory allocation. It takes a lot
less
> time to allocate 2 million 1-byte booleans than 2 million 8-byte
doubles
> (and the HRRR contains about 2 million points).
>
> One additional detail to mention. Earlier versions of Grid-Stat
were
> slowed down by some poor memory allocation logic. The code
allocated
> memory "on-demand"... meaning that as arrays grew, they
automatically
> resized themselves. That was convenient in coding but led to a huge
hit in
> efficiency. We were allocating and dellocating memory
> over-and-over-and-over again. Cleaning that up and allocating the
memory
> all at once up-front made the code much faster.
>
> ... But in my testing this morning, I found/fixed one more instance
of this
> issue. It was suspicious that adding in the "FULL" model domain
made
> Grid-Stat run almost twice as slow (22 seconds without and 44
seconds with
> it). After fixing that issues, it takes only 22.3 seconds.
>
> For comparison, I ran met-7.0 and met-7.1 with 123 CWA's + FULL
domain:
> met-7.0 took 1:31 and met-7.1 took 22.3 seconds.
>
> I need to test a few more changes first but hope to get the next
beta
> release out tomorrow or Monday.
>
> Does that work for you?
>
> Thanks,
> John
>
> On Thu, Jul 5, 2018 at 10:43 AM dana.strom at noaa.gov via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> >
> > Hi John,
> >
> > Awesome news! That looks like it's going to help out greatly for
memory
> > usage!
> >
> > Do you attribtute the speed increase to this as well? That's a
tremendous
> > uptick in efficiency (especially if it's solely attributed to the
I/O of
> > the smaller masks). I would be stoked if the grid_stat calculation
time
> > decreased by 70%.
> >
> > When do you think you'll get the next beta out? I'd love to give
it a
> spin.
> >
> > Dana
> >
> > On Thu, Jul 5, 2018 at 12:27 PM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> >
> > > Hi Dana,
> > >
> > > Wanted to give you an update on this issue. In the development
version
> > of
> > > the MET code, I modified how the masking regions are stored...
using
> > > booleans instead of double-precision. Then I ran a test of
Grid-Stat
> to
> > > quantify the impact.
> > >
> > > For the HRRR domain (1799x1059), I ran Grid-Stat to compare one
field
> to
> > > itself. But I computed the stats over 123 County Warning Areas
that I
> > > generated used gen_vx_mask.
> > >
> > > Here's how the runs compared on my machine:
> > > met-7.0 took 01:12 to run and consumed about 2 GB of memory.
> > > met-7.1 took 00:22 to run and consumed about 0.47 GB of
memory.
> > >
> > > Definitely headed in the right direction! These changes will be
> included
> > > in the next beta release for met-7.1 for you to try out.
> > >
> > > Thanks,
> > > John
> > >
> > > On Fri, Jun 15, 2018 at 5:14 AM dana.strom at noaa.gov via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541
>
> > > >
> > > > John,
> > > >
> > > > Awesome, thank you! We can certainly reduce the amount of jobs
we run
> > and
> > > > still get things done, it'll just take a little bit longer
until the
> > > > solution is implemented. We'll continue to generate test data
using a
> > > small
> > > > subset of data to test our UI.
> > > >
> > > > Thanks again for your help! This is going to do wonders for
our
> > > > optimization and real-time processing.
> > > >
> > > > Dana
> > > >
> > > > On Thu, Jun 14, 2018 at 4:23 PM, John Halley Gotway via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > > Dana,
> > > > >
> > > > > I've attached the issue I wrote up in JiRA (our software
issue
> > tracking
> > > > > system) describing this functionality.
> > > > >
> > > > > While you wait for this change, would it be feasible to
reduce the
> > > number
> > > > > of masking regions to a much smaller set?
> > > > >
> > > > > For those on this ticket at NCAR, here's the link:
> > > > > https://sdg.rap.ucar.edu/jira/browse/MET-1011
> > > > >
> > > > > Thanks,
> > > > > John
> > > > >
> > > > > On Thu, Jun 14, 2018 at 1:34 PM John Halley Gotway <
> johnhg at ucar.edu>
> > > > > wrote:
> > > > >
> > > > > > Agreed... I do think option 2 would be a lot easier to
implement,
> > > > require
> > > > > > no changes to the config file logic, and require no
updates to
> the
> > > > > > documentation or online tutorial. It should give us a
good bang
> > for
> > > > the
> > > > > > buck.
> > > > > >
> > > > > > I'll touch base with Tara about this.
> > > > > >
> > > > > > Thanks,
> > > > > > John
> > > > > >
> > > > > > On Thu, Jun 14, 2018 at 1:31 PM dana.strom at noaa.gov via RT
<
> > > > > > met_help at ucar.edu> wrote:
> > > > > >
> > > > > >>
> > > > > >> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> > > > > >>
> > > > > >> John,
> > > > > >>
> > > > > >> Those sound like two great options! It seems, based on
your
> > answers,
> > > > the
> > > > > >> one that would take less time / development on your end
is
> option
> > 2.
> > > > I'm
> > > > > >> fine with having the same setup I have with non-float
values for
> > > each
> > > > > grid
> > > > > >> point, and it seems like it won't be too hard of a change
on
> your
> > > end
> > > > to
> > > > > >> look for the true/false vs float. Am I correct? That
being said,
> > > > > whichever
> > > > > >> solution you choose will suffice!
> > > > > >>
> > > > > >> Let me know which one you run with and I can adapt to
whatever
> > > > solution
> > > > > >> you
> > > > > >> need me to.
> > > > > >>
> > > > > >> Thank you as always for your quick response and your
help!
> > > > > >>
> > > > > >> Dana
> > > > > >>
> > > > > >> On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway via
RT <
> > > > > >> met_help at ucar.edu> wrote:
> > > > > >>
> > > > > >> > Dana,
> > > > > >> >
> > > > > >> > Great question. I don't have an obvious solution for
you
> right
> > > now
> > > > > but
> > > > > >> I
> > > > > >> > do have two ideas for how we might enhance MET to
better
> > > facilitate
> > > > > >> this.
> > > > > >> >
> > > > > >> > Here's the first one...
> > > > > >> >
> > > > > >> > Essentially, you're applying 3 different types of
masks: CWA,
> > RFC,
> > > > and
> > > > > >> > CONUS. For each of these types, each grid point can be
> > assigned a
> > > > > >> single
> > > > > >> > mask value:
> > > > > >> > - CONUS: 0 or 1
> > > > > >> > - RFC: 0 through 12
> > > > > >> > - CWA: 0 through 122 (or so)
> > > > > >> >
> > > > > >> > For each of these 3, we could run gen_vx_mask
iteratively to
> > > assign
> > > > an
> > > > > >> > integer to each grid point to define where it belongs.
That'd
> > > give
> > > > > us 3
> > > > > >> > gridded data files to define the masks.
> > > > > >> >
> > > > > >> > In the MET config files, right now we can apply "data
> masking",
> > > > which
> > > > > >> > means... read a data field and apply a threshold. We
could
> pass
> > > in
> > > > > >> these 3
> > > > > >> > files and apply many thresholds like "==1" and "==2",
and so
> on.
> > > > > >> >
> > > > > >> > As of met-7.0, this would still result in 133 gridded
masks
> > being
> > > > > >> defined
> > > > > >> > and the same memory usage. The suggestion is that we
define a
> > > more
> > > > > >> concise
> > > > > >> > way of doing this. Ideally, the MET tools would read
these 3
> > > fields
> > > > > and
> > > > > >> > define verification masks for each unique value >0 that
it
> finds
> > > in
> > > > > the
> > > > > >> > data.
> > > > > >> >
> > > > > >> > This would definitely require a change to the code but
may be
> > > > > possible.
> > > > > >> > One detail that's important is figuring out what name
to
> assign
> > to
> > > > > each
> > > > > >> > mask value to populate the VX_MASK column in the
output.
> > > > > >> >
> > > > > >> > Here's the second one...
> > > > > >> >
> > > > > >> > We've been a bit lazy in the code, defining the masking
> regions
> > > > using
> > > > > >> the
> > > > > >> > DataPlane class in MET, which stores double precision
values.
> > > > Really
> > > > > >> all
> > > > > >> > we need to store is true/false. So we could switch the
data
> > type
> > > > for
> > > > > >> > storing masks to consume much, much less memory in
storing the
> > 133
> > > > > >> masks...
> > > > > >> > and then keep the exact same setup you're already
using.
> > > > > >> >
> > > > > >> > What do you think of these options?
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > John
> > > > > >> >
> > > > > >> > On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov
via RT <
> > > > > >> > met_help at ucar.edu> wrote:
> > > > > >> >
> > > > > >> > >
> > > > > >> > > Thu Jun 14 11:08:51 2018: Request 85541 was acted
upon.
> > > > > >> > > Transaction: Ticket created by dana.strom at noaa.gov
> > > > > >> > > Queue: met_help
> > > > > >> > > Subject: Thoughts on memory usage while
automating
> > > > verification
> > > > > >> > > Owner: Nobody
> > > > > >> > > Requestors: dana.strom at noaa.gov
> > > > > >> > > Status: new
> > > > > >> > > Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > > > Ticket/Display.html?id=85541
> > > > > >> >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > Good afternoon,
> > > > > >> > >
> > > > > >> > > My team and I are trying to figure out a way to
optimize our
> > > > > real-time
> > > > > >> > > verification processing. We have run into an issue (I
> believe)
> > > as
> > > > a
> > > > > >> > result
> > > > > >> > > of the size of the masks combined with how many
threads I
> want
> > > to
> > > > > run.
> > > > > >> > This
> > > > > >> > > is only an issue when running CONUS verification due
to the
> > > amount
> > > > > of
> > > > > >> > masks
> > > > > >> > > we have to run (all CWAs, Regions, RFCs, CONUS). It
turns
> out
> > to
> > > > be
> > > > > >> 133
> > > > > >> > max
> > > > > >> > > masks. It appears that MET takes all of the masks and
stores
> > > them
> > > > in
> > > > > >> > memory
> > > > > >> > > for the duration of the computation, is that correct?
If so:
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > Example CONUS NDFD verification run:
> > > > > >> > >
> > > > > >> > > 1 lead time verification = 133 masks at ~35MB each ~
4.5GB
> RAM
> > > per
> > > > > >> lead
> > > > > >> > > time
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > The issue is that our machine only has 16GB RAM
available,
> so
> > I
> > > > can
> > > > > >> only
> > > > > >> > > run 3 threads at a time with this current
configuration.
> With
> > > > > >> > verification
> > > > > >> > > run-time at approximately 7 minutes per lead time,
real-time
> > > > > >> verification
> > > > > >> > > processing time is going to soar if I can't run more
threads
> > at
> > > a
> > > > > >> time.
> > > > > >> > >
> > > > > >> > > We have 16 core available and I'm trying to use
everything I
> > can
> > > > to
> > > > > >> speed
> > > > > >> > > this up in preparations for expanding this to other
weather
> > > > elements
> > > > > >> in
> > > > > >> > the
> > > > > >> > > future when we centralize and modernize MDL's
verification
> > > > efforts.
> > > > > >> > >
> > > > > >> > > Is there anything we can do to optimize? Sharing mask
RAM
> > across
> > > > > >> threads?
> > > > > >> > > Compress the masks? We could also run threads with
less
> masks,
> > > but
> > > > > >> that
> > > > > >> > > doesn't seem like it would help too much in terms of
total
> > time.
> > > > > >> > >
> > > > > >> > > Any thoughts would be super helpful!
> > > > > >> > >
> > > > > >> > > Thank you,
> > > > > >> > > Dana
> > > > > >> > >
> > > > > >> > > --
> > > > > >> > > Dana Strom
> > > > > >> > >
> > > > > >> > > Technical Lead, AceInfo Solutions
> > > > > >> > >
> > > > > >> > > Meteorological Development Laboratory (NWS)
> > > > > >> > >
> > > > > >> > > Phone: 301-427-9451
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Dana Strom
> > > > > >>
> > > > > >> Technical Lead, AceInfo Solutions
> > > > > >>
> > > > > >> Meteorological Development Laboratory (NWS)
> > > > > >>
> > > > > >> Phone: 301-427-9451
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Dana Strom
> > > >
> > > > Technical Lead, AceInfo Solutions
> > > >
> > > > Meteorological Development Laboratory (NWS)
> > > >
> > > > Phone: 301-427-9451
> > > >
> > > >
> > >
> > >
> >
> >
> > --
> > Dana Strom
> >
> > Technical Lead, AceInfo Solutions
> >
> > Meteorological Development Laboratory (NWS)
> >
> > Phone: 301-427-9451
> >
> >
>
>
--
Dana Strom
Technical Lead, AceInfo Solutions
Meteorological Development Laboratory (NWS)
Phone: 301-427-9451
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: John Halley Gotway
Time: Thu Jul 05 12:11:17 2018
Dana,
Sure, will do. Building the docker image for the beta release should
be
pretty straight-forward.
Thanks,
John
On Thu, Jul 5, 2018 at 11:37 AM dana.strom at noaa.gov via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
>
> John,
>
> Memory allocation is always a HUGE pain, great find!
>
> Tomorrow or Monday for the release sounds more than awesome.
>
> My team is utilizing not only WCOSS (where I beta test) but a docker
> container on our local system for real-time processing. Would it be
too
> much to ask to also get the beta as a container as well? This fix
looks
> like it's going to be the solution we need to turn on our real-time
> processing for everything (and for me to beat the crap out of our
server :)
> ).
>
> As always, thank you very much for your efforts!
>
> Dana
>
> On Thu, Jul 5, 2018 at 1:20 PM, John Halley Gotway via RT <
> met_help at ucar.edu
> > wrote:
>
> > Dana,
> >
> > I don't have a definitive explanation for the increase in speed,
but an
> > educated guess is that its due to memory allocation. It takes a
lot less
> > time to allocate 2 million 1-byte booleans than 2 million 8-byte
doubles
> > (and the HRRR contains about 2 million points).
> >
> > One additional detail to mention. Earlier versions of Grid-Stat
were
> > slowed down by some poor memory allocation logic. The code
allocated
> > memory "on-demand"... meaning that as arrays grew, they
automatically
> > resized themselves. That was convenient in coding but led to a
huge hit
> in
> > efficiency. We were allocating and dellocating memory
> > over-and-over-and-over again. Cleaning that up and allocating the
memory
> > all at once up-front made the code much faster.
> >
> > ... But in my testing this morning, I found/fixed one more
instance of
> this
> > issue. It was suspicious that adding in the "FULL" model domain
made
> > Grid-Stat run almost twice as slow (22 seconds without and 44
seconds
> with
> > it). After fixing that issues, it takes only 22.3 seconds.
> >
> > For comparison, I ran met-7.0 and met-7.1 with 123 CWA's + FULL
domain:
> > met-7.0 took 1:31 and met-7.1 took 22.3 seconds.
> >
> > I need to test a few more changes first but hope to get the next
beta
> > release out tomorrow or Monday.
> >
> > Does that work for you?
> >
> > Thanks,
> > John
> >
> > On Thu, Jul 5, 2018 at 10:43 AM dana.strom at noaa.gov via RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> > >
> > > Hi John,
> > >
> > > Awesome news! That looks like it's going to help out greatly for
memory
> > > usage!
> > >
> > > Do you attribtute the speed increase to this as well? That's a
> tremendous
> > > uptick in efficiency (especially if it's solely attributed to
the I/O
> of
> > > the smaller masks). I would be stoked if the grid_stat
calculation time
> > > decreased by 70%.
> > >
> > > When do you think you'll get the next beta out? I'd love to give
it a
> > spin.
> > >
> > > Dana
> > >
> > > On Thu, Jul 5, 2018 at 12:27 PM, John Halley Gotway via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > > Hi Dana,
> > > >
> > > > Wanted to give you an update on this issue. In the
development
> version
> > > of
> > > > the MET code, I modified how the masking regions are stored...
using
> > > > booleans instead of double-precision. Then I ran a test of
Grid-Stat
> > to
> > > > quantify the impact.
> > > >
> > > > For the HRRR domain (1799x1059), I ran Grid-Stat to compare
one field
> > to
> > > > itself. But I computed the stats over 123 County Warning
Areas that
> I
> > > > generated used gen_vx_mask.
> > > >
> > > > Here's how the runs compared on my machine:
> > > > met-7.0 took 01:12 to run and consumed about 2 GB of memory.
> > > > met-7.1 took 00:22 to run and consumed about 0.47 GB of
memory.
> > > >
> > > > Definitely headed in the right direction! These changes will
be
> > included
> > > > in the next beta release for met-7.1 for you to try out.
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > > On Fri, Jun 15, 2018 at 5:14 AM dana.strom at noaa.gov via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> > > > >
> > > > > John,
> > > > >
> > > > > Awesome, thank you! We can certainly reduce the amount of
jobs we
> run
> > > and
> > > > > still get things done, it'll just take a little bit longer
until
> the
> > > > > solution is implemented. We'll continue to generate test
data
> using a
> > > > small
> > > > > subset of data to test our UI.
> > > > >
> > > > > Thanks again for your help! This is going to do wonders for
our
> > > > > optimization and real-time processing.
> > > > >
> > > > > Dana
> > > > >
> > > > > On Thu, Jun 14, 2018 at 4:23 PM, John Halley Gotway via RT <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > > > Dana,
> > > > > >
> > > > > > I've attached the issue I wrote up in JiRA (our software
issue
> > > tracking
> > > > > > system) describing this functionality.
> > > > > >
> > > > > > While you wait for this change, would it be feasible to
reduce
> the
> > > > number
> > > > > > of masking regions to a much smaller set?
> > > > > >
> > > > > > For those on this ticket at NCAR, here's the link:
> > > > > > https://sdg.rap.ucar.edu/jira/browse/MET-1011
> > > > > >
> > > > > > Thanks,
> > > > > > John
> > > > > >
> > > > > > On Thu, Jun 14, 2018 at 1:34 PM John Halley Gotway <
> > johnhg at ucar.edu>
> > > > > > wrote:
> > > > > >
> > > > > > > Agreed... I do think option 2 would be a lot easier to
> implement,
> > > > > require
> > > > > > > no changes to the config file logic, and require no
updates to
> > the
> > > > > > > documentation or online tutorial. It should give us a
good
> bang
> > > for
> > > > > the
> > > > > > > buck.
> > > > > > >
> > > > > > > I'll touch base with Tara about this.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > John
> > > > > > >
> > > > > > > On Thu, Jun 14, 2018 at 1:31 PM dana.strom at noaa.gov via
RT <
> > > > > > > met_help at ucar.edu> wrote:
> > > > > > >
> > > > > > >>
> > > > > > >> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541
> >
> > > > > > >>
> > > > > > >> John,
> > > > > > >>
> > > > > > >> Those sound like two great options! It seems, based on
your
> > > answers,
> > > > > the
> > > > > > >> one that would take less time / development on your end
is
> > option
> > > 2.
> > > > > I'm
> > > > > > >> fine with having the same setup I have with non-float
values
> for
> > > > each
> > > > > > grid
> > > > > > >> point, and it seems like it won't be too hard of a
change on
> > your
> > > > end
> > > > > to
> > > > > > >> look for the true/false vs float. Am I correct? That
being
> said,
> > > > > > whichever
> > > > > > >> solution you choose will suffice!
> > > > > > >>
> > > > > > >> Let me know which one you run with and I can adapt to
whatever
> > > > > solution
> > > > > > >> you
> > > > > > >> need me to.
> > > > > > >>
> > > > > > >> Thank you as always for your quick response and your
help!
> > > > > > >>
> > > > > > >> Dana
> > > > > > >>
> > > > > > >> On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway via
RT <
> > > > > > >> met_help at ucar.edu> wrote:
> > > > > > >>
> > > > > > >> > Dana,
> > > > > > >> >
> > > > > > >> > Great question. I don't have an obvious solution for
you
> > right
> > > > now
> > > > > > but
> > > > > > >> I
> > > > > > >> > do have two ideas for how we might enhance MET to
better
> > > > facilitate
> > > > > > >> this.
> > > > > > >> >
> > > > > > >> > Here's the first one...
> > > > > > >> >
> > > > > > >> > Essentially, you're applying 3 different types of
masks:
> CWA,
> > > RFC,
> > > > > and
> > > > > > >> > CONUS. For each of these types, each grid point can
be
> > > assigned a
> > > > > > >> single
> > > > > > >> > mask value:
> > > > > > >> > - CONUS: 0 or 1
> > > > > > >> > - RFC: 0 through 12
> > > > > > >> > - CWA: 0 through 122 (or so)
> > > > > > >> >
> > > > > > >> > For each of these 3, we could run gen_vx_mask
iteratively to
> > > > assign
> > > > > an
> > > > > > >> > integer to each grid point to define where it
belongs.
> That'd
> > > > give
> > > > > > us 3
> > > > > > >> > gridded data files to define the masks.
> > > > > > >> >
> > > > > > >> > In the MET config files, right now we can apply "data
> > masking",
> > > > > which
> > > > > > >> > means... read a data field and apply a threshold. We
could
> > pass
> > > > in
> > > > > > >> these 3
> > > > > > >> > files and apply many thresholds like "==1" and "==2",
and so
> > on.
> > > > > > >> >
> > > > > > >> > As of met-7.0, this would still result in 133 gridded
masks
> > > being
> > > > > > >> defined
> > > > > > >> > and the same memory usage. The suggestion is that we
> define a
> > > > more
> > > > > > >> concise
> > > > > > >> > way of doing this. Ideally, the MET tools would read
these
> 3
> > > > fields
> > > > > > and
> > > > > > >> > define verification masks for each unique value >0
that it
> > finds
> > > > in
> > > > > > the
> > > > > > >> > data.
> > > > > > >> >
> > > > > > >> > This would definitely require a change to the code
but may
> be
> > > > > > possible.
> > > > > > >> > One detail that's important is figuring out what name
to
> > assign
> > > to
> > > > > > each
> > > > > > >> > mask value to populate the VX_MASK column in the
output.
> > > > > > >> >
> > > > > > >> > Here's the second one...
> > > > > > >> >
> > > > > > >> > We've been a bit lazy in the code, defining the
masking
> > regions
> > > > > using
> > > > > > >> the
> > > > > > >> > DataPlane class in MET, which stores double precision
> values.
> > > > > Really
> > > > > > >> all
> > > > > > >> > we need to store is true/false. So we could switch
the data
> > > type
> > > > > for
> > > > > > >> > storing masks to consume much, much less memory in
storing
> the
> > > 133
> > > > > > >> masks...
> > > > > > >> > and then keep the exact same setup you're already
using.
> > > > > > >> >
> > > > > > >> > What do you think of these options?
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > John
> > > > > > >> >
> > > > > > >> > On Thu, Jun 14, 2018 at 11:09 AM dana.strom at noaa.gov
via
> RT <
> > > > > > >> > met_help at ucar.edu> wrote:
> > > > > > >> >
> > > > > > >> > >
> > > > > > >> > > Thu Jun 14 11:08:51 2018: Request 85541 was acted
upon.
> > > > > > >> > > Transaction: Ticket created by dana.strom at noaa.gov
> > > > > > >> > > Queue: met_help
> > > > > > >> > > Subject: Thoughts on memory usage while
automating
> > > > > verification
> > > > > > >> > > Owner: Nobody
> > > > > > >> > > Requestors: dana.strom at noaa.gov
> > > > > > >> > > Status: new
> > > > > > >> > > Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > > > > Ticket/Display.html?id=85541
> > > > > > >> >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > Good afternoon,
> > > > > > >> > >
> > > > > > >> > > My team and I are trying to figure out a way to
optimize
> our
> > > > > > real-time
> > > > > > >> > > verification processing. We have run into an issue
(I
> > believe)
> > > > as
> > > > > a
> > > > > > >> > result
> > > > > > >> > > of the size of the masks combined with how many
threads I
> > want
> > > > to
> > > > > > run.
> > > > > > >> > This
> > > > > > >> > > is only an issue when running CONUS verification
due to
> the
> > > > amount
> > > > > > of
> > > > > > >> > masks
> > > > > > >> > > we have to run (all CWAs, Regions, RFCs, CONUS). It
turns
> > out
> > > to
> > > > > be
> > > > > > >> 133
> > > > > > >> > max
> > > > > > >> > > masks. It appears that MET takes all of the masks
and
> stores
> > > > them
> > > > > in
> > > > > > >> > memory
> > > > > > >> > > for the duration of the computation, is that
correct? If
> so:
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > Example CONUS NDFD verification run:
> > > > > > >> > >
> > > > > > >> > > 1 lead time verification = 133 masks at ~35MB each
~ 4.5GB
> > RAM
> > > > per
> > > > > > >> lead
> > > > > > >> > > time
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > The issue is that our machine only has 16GB RAM
available,
> > so
> > > I
> > > > > can
> > > > > > >> only
> > > > > > >> > > run 3 threads at a time with this current
configuration.
> > With
> > > > > > >> > verification
> > > > > > >> > > run-time at approximately 7 minutes per lead time,
> real-time
> > > > > > >> verification
> > > > > > >> > > processing time is going to soar if I can't run
more
> threads
> > > at
> > > > a
> > > > > > >> time.
> > > > > > >> > >
> > > > > > >> > > We have 16 core available and I'm trying to use
> everything I
> > > can
> > > > > to
> > > > > > >> speed
> > > > > > >> > > this up in preparations for expanding this to other
> weather
> > > > > elements
> > > > > > >> in
> > > > > > >> > the
> > > > > > >> > > future when we centralize and modernize MDL's
verification
> > > > > efforts.
> > > > > > >> > >
> > > > > > >> > > Is there anything we can do to optimize? Sharing
mask RAM
> > > across
> > > > > > >> threads?
> > > > > > >> > > Compress the masks? We could also run threads with
less
> > masks,
> > > > but
> > > > > > >> that
> > > > > > >> > > doesn't seem like it would help too much in terms
of total
> > > time.
> > > > > > >> > >
> > > > > > >> > > Any thoughts would be super helpful!
> > > > > > >> > >
> > > > > > >> > > Thank you,
> > > > > > >> > > Dana
> > > > > > >> > >
> > > > > > >> > > --
> > > > > > >> > > Dana Strom
> > > > > > >> > >
> > > > > > >> > > Technical Lead, AceInfo Solutions
> > > > > > >> > >
> > > > > > >> > > Meteorological Development Laboratory (NWS)
> > > > > > >> > >
> > > > > > >> > > Phone: 301-427-9451
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Dana Strom
> > > > > > >>
> > > > > > >> Technical Lead, AceInfo Solutions
> > > > > > >>
> > > > > > >> Meteorological Development Laboratory (NWS)
> > > > > > >>
> > > > > > >> Phone: 301-427-9451
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Dana Strom
> > > > >
> > > > > Technical Lead, AceInfo Solutions
> > > > >
> > > > > Meteorological Development Laboratory (NWS)
> > > > >
> > > > > Phone: 301-427-9451
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Dana Strom
> > >
> > > Technical Lead, AceInfo Solutions
> > >
> > > Meteorological Development Laboratory (NWS)
> > >
> > > Phone: 301-427-9451
> > >
> > >
> >
> >
>
>
> --
> Dana Strom
>
> Technical Lead, AceInfo Solutions
>
> Meteorological Development Laboratory (NWS)
>
> Phone: 301-427-9451
>
>
------------------------------------------------
Subject: Thoughts on memory usage while automating verification
From: dana.strom at noaa.gov
Time: Thu Jul 05 12:28:43 2018
John,
Most exciting! Thank you!
Dana
On Thu, Jul 5, 2018 at 2:11 PM, John Halley Gotway via RT
<met_help at ucar.edu
> wrote:
> Dana,
>
> Sure, will do. Building the docker image for the beta release
should be
> pretty straight-forward.
>
> Thanks,
> John
>
> On Thu, Jul 5, 2018 at 11:37 AM dana.strom at noaa.gov via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> >
> > John,
> >
> > Memory allocation is always a HUGE pain, great find!
> >
> > Tomorrow or Monday for the release sounds more than awesome.
> >
> > My team is utilizing not only WCOSS (where I beta test) but a
docker
> > container on our local system for real-time processing. Would it
be too
> > much to ask to also get the beta as a container as well? This fix
looks
> > like it's going to be the solution we need to turn on our real-
time
> > processing for everything (and for me to beat the crap out of our
server
> :)
> > ).
> >
> > As always, thank you very much for your efforts!
> >
> > Dana
> >
> > On Thu, Jul 5, 2018 at 1:20 PM, John Halley Gotway via RT <
> > met_help at ucar.edu
> > > wrote:
> >
> > > Dana,
> > >
> > > I don't have a definitive explanation for the increase in speed,
but an
> > > educated guess is that its due to memory allocation. It takes a
lot
> less
> > > time to allocate 2 million 1-byte booleans than 2 million 8-byte
> doubles
> > > (and the HRRR contains about 2 million points).
> > >
> > > One additional detail to mention. Earlier versions of Grid-Stat
were
> > > slowed down by some poor memory allocation logic. The code
allocated
> > > memory "on-demand"... meaning that as arrays grew, they
automatically
> > > resized themselves. That was convenient in coding but led to a
huge
> hit
> > in
> > > efficiency. We were allocating and dellocating memory
> > > over-and-over-and-over again. Cleaning that up and allocating
the
> memory
> > > all at once up-front made the code much faster.
> > >
> > > ... But in my testing this morning, I found/fixed one more
instance of
> > this
> > > issue. It was suspicious that adding in the "FULL" model domain
made
> > > Grid-Stat run almost twice as slow (22 seconds without and 44
seconds
> > with
> > > it). After fixing that issues, it takes only 22.3 seconds.
> > >
> > > For comparison, I ran met-7.0 and met-7.1 with 123 CWA's + FULL
domain:
> > > met-7.0 took 1:31 and met-7.1 took 22.3 seconds.
> > >
> > > I need to test a few more changes first but hope to get the next
beta
> > > release out tomorrow or Monday.
> > >
> > > Does that work for you?
> > >
> > > Thanks,
> > > John
> > >
> > > On Thu, Jul 5, 2018 at 10:43 AM dana.strom at noaa.gov via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541
>
> > > >
> > > > Hi John,
> > > >
> > > > Awesome news! That looks like it's going to help out greatly
for
> memory
> > > > usage!
> > > >
> > > > Do you attribtute the speed increase to this as well? That's a
> > tremendous
> > > > uptick in efficiency (especially if it's solely attributed to
the I/O
> > of
> > > > the smaller masks). I would be stoked if the grid_stat
calculation
> time
> > > > decreased by 70%.
> > > >
> > > > When do you think you'll get the next beta out? I'd love to
give it a
> > > spin.
> > > >
> > > > Dana
> > > >
> > > > On Thu, Jul 5, 2018 at 12:27 PM, John Halley Gotway via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > > Hi Dana,
> > > > >
> > > > > Wanted to give you an update on this issue. In the
development
> > version
> > > > of
> > > > > the MET code, I modified how the masking regions are
stored...
> using
> > > > > booleans instead of double-precision. Then I ran a test of
> Grid-Stat
> > > to
> > > > > quantify the impact.
> > > > >
> > > > > For the HRRR domain (1799x1059), I ran Grid-Stat to compare
one
> field
> > > to
> > > > > itself. But I computed the stats over 123 County Warning
Areas
> that
> > I
> > > > > generated used gen_vx_mask.
> > > > >
> > > > > Here's how the runs compared on my machine:
> > > > > met-7.0 took 01:12 to run and consumed about 2 GB of
memory.
> > > > > met-7.1 took 00:22 to run and consumed about 0.47 GB of
memory.
> > > > >
> > > > > Definitely headed in the right direction! These changes
will be
> > > included
> > > > > in the next beta release for met-7.1 for you to try out.
> > > > >
> > > > > Thanks,
> > > > > John
> > > > >
> > > > > On Fri, Jun 15, 2018 at 5:14 AM dana.strom at noaa.gov via RT <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > > >
> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85541 >
> > > > > >
> > > > > > John,
> > > > > >
> > > > > > Awesome, thank you! We can certainly reduce the amount of
jobs we
> > run
> > > > and
> > > > > > still get things done, it'll just take a little bit longer
until
> > the
> > > > > > solution is implemented. We'll continue to generate test
data
> > using a
> > > > > small
> > > > > > subset of data to test our UI.
> > > > > >
> > > > > > Thanks again for your help! This is going to do wonders
for our
> > > > > > optimization and real-time processing.
> > > > > >
> > > > > > Dana
> > > > > >
> > > > > > On Thu, Jun 14, 2018 at 4:23 PM, John Halley Gotway via RT
<
> > > > > > met_help at ucar.edu> wrote:
> > > > > >
> > > > > > > Dana,
> > > > > > >
> > > > > > > I've attached the issue I wrote up in JiRA (our software
issue
> > > > tracking
> > > > > > > system) describing this functionality.
> > > > > > >
> > > > > > > While you wait for this change, would it be feasible to
reduce
> > the
> > > > > number
> > > > > > > of masking regions to a much smaller set?
> > > > > > >
> > > > > > > For those on this ticket at NCAR, here's the link:
> > > > > > > https://sdg.rap.ucar.edu/jira/browse/MET-1011
> > > > > > >
> > > > > > > Thanks,
> > > > > > > John
> > > > > > >
> > > > > > > On Thu, Jun 14, 2018 at 1:34 PM John Halley Gotway <
> > > johnhg at ucar.edu>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Agreed... I do think option 2 would be a lot easier to
> > implement,
> > > > > > require
> > > > > > > > no changes to the config file logic, and require no
updates
> to
> > > the
> > > > > > > > documentation or online tutorial. It should give us a
good
> > bang
> > > > for
> > > > > > the
> > > > > > > > buck.
> > > > > > > >
> > > > > > > > I'll touch base with Tara about this.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > John
> > > > > > > >
> > > > > > > > On Thu, Jun 14, 2018 at 1:31 PM dana.strom at noaa.gov
via RT <
> > > > > > > > met_help at ucar.edu> wrote:
> > > > > > > >
> > > > > > > >>
> > > > > > > >> <URL: https://rt.rap.ucar.edu/rt/
> Ticket/Display.html?id=85541
> > >
> > > > > > > >>
> > > > > > > >> John,
> > > > > > > >>
> > > > > > > >> Those sound like two great options! It seems, based
on your
> > > > answers,
> > > > > > the
> > > > > > > >> one that would take less time / development on your
end is
> > > option
> > > > 2.
> > > > > > I'm
> > > > > > > >> fine with having the same setup I have with non-float
values
> > for
> > > > > each
> > > > > > > grid
> > > > > > > >> point, and it seems like it won't be too hard of a
change on
> > > your
> > > > > end
> > > > > > to
> > > > > > > >> look for the true/false vs float. Am I correct? That
being
> > said,
> > > > > > > whichever
> > > > > > > >> solution you choose will suffice!
> > > > > > > >>
> > > > > > > >> Let me know which one you run with and I can adapt to
> whatever
> > > > > > solution
> > > > > > > >> you
> > > > > > > >> need me to.
> > > > > > > >>
> > > > > > > >> Thank you as always for your quick response and your
help!
> > > > > > > >>
> > > > > > > >> Dana
> > > > > > > >>
> > > > > > > >> On Thu, Jun 14, 2018 at 2:02 PM, John Halley Gotway
via RT <
> > > > > > > >> met_help at ucar.edu> wrote:
> > > > > > > >>
> > > > > > > >> > Dana,
> > > > > > > >> >
> > > > > > > >> > Great question. I don't have an obvious solution
for you
> > > right
> > > > > now
> > > > > > > but
> > > > > > > >> I
> > > > > > > >> > do have two ideas for how we might enhance MET to
better
> > > > > facilitate
> > > > > > > >> this.
> > > > > > > >> >
> > > > > > > >> > Here's the first one...
> > > > > > > >> >
> > > > > > > >> > Essentially, you're applying 3 different types of
masks:
> > CWA,
> > > > RFC,
> > > > > > and
> > > > > > > >> > CONUS. For each of these types, each grid point
can be
> > > > assigned a
> > > > > > > >> single
> > > > > > > >> > mask value:
> > > > > > > >> > - CONUS: 0 or 1
> > > > > > > >> > - RFC: 0 through 12
> > > > > > > >> > - CWA: 0 through 122 (or so)
> > > > > > > >> >
> > > > > > > >> > For each of these 3, we could run gen_vx_mask
iteratively
> to
> > > > > assign
> > > > > > an
> > > > > > > >> > integer to each grid point to define where it
belongs.
> > That'd
> > > > > give
> > > > > > > us 3
> > > > > > > >> > gridded data files to define the masks.
> > > > > > > >> >
> > > > > > > >> > In the MET config files, right now we can apply
"data
> > > masking",
> > > > > > which
> > > > > > > >> > means... read a data field and apply a threshold.
We
> could
> > > pass
> > > > > in
> > > > > > > >> these 3
> > > > > > > >> > files and apply many thresholds like "==1" and
"==2", and
> so
> > > on.
> > > > > > > >> >
> > > > > > > >> > As of met-7.0, this would still result in 133
gridded
> masks
> > > > being
> > > > > > > >> defined
> > > > > > > >> > and the same memory usage. The suggestion is that
we
> > define a
> > > > > more
> > > > > > > >> concise
> > > > > > > >> > way of doing this. Ideally, the MET tools would
read
> these
> > 3
> > > > > fields
> > > > > > > and
> > > > > > > >> > define verification masks for each unique value >0
that it
> > > finds
> > > > > in
> > > > > > > the
> > > > > > > >> > data.
> > > > > > > >> >
> > > > > > > >> > This would definitely require a change to the code
but may
> > be
> > > > > > > possible.
> > > > > > > >> > One detail that's important is figuring out what
name to
> > > assign
> > > > to
> > > > > > > each
> > > > > > > >> > mask value to populate the VX_MASK column in the
output.
> > > > > > > >> >
> > > > > > > >> > Here's the second one...
> > > > > > > >> >
> > > > > > > >> > We've been a bit lazy in the code, defining the
masking
> > > regions
> > > > > > using
> > > > > > > >> the
> > > > > > > >> > DataPlane class in MET, which stores double
precision
> > values.
> > > > > > Really
> > > > > > > >> all
> > > > > > > >> > we need to store is true/false. So we could switch
the
> data
> > > > type
> > > > > > for
> > > > > > > >> > storing masks to consume much, much less memory in
storing
> > the
> > > > 133
> > > > > > > >> masks...
> > > > > > > >> > and then keep the exact same setup you're already
using.
> > > > > > > >> >
> > > > > > > >> > What do you think of these options?
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> > John
> > > > > > > >> >
> > > > > > > >> > On Thu, Jun 14, 2018 at 11:09 AM
dana.strom at noaa.gov via
> > RT <
> > > > > > > >> > met_help at ucar.edu> wrote:
> > > > > > > >> >
> > > > > > > >> > >
> > > > > > > >> > > Thu Jun 14 11:08:51 2018: Request 85541 was acted
upon.
> > > > > > > >> > > Transaction: Ticket created by
dana.strom at noaa.gov
> > > > > > > >> > > Queue: met_help
> > > > > > > >> > > Subject: Thoughts on memory usage while
automating
> > > > > > verification
> > > > > > > >> > > Owner: Nobody
> > > > > > > >> > > Requestors: dana.strom at noaa.gov
> > > > > > > >> > > Status: new
> > > > > > > >> > > Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > > > > > Ticket/Display.html?id=85541
> > > > > > > >> >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > Good afternoon,
> > > > > > > >> > >
> > > > > > > >> > > My team and I are trying to figure out a way to
optimize
> > our
> > > > > > > real-time
> > > > > > > >> > > verification processing. We have run into an
issue (I
> > > believe)
> > > > > as
> > > > > > a
> > > > > > > >> > result
> > > > > > > >> > > of the size of the masks combined with how many
threads
> I
> > > want
> > > > > to
> > > > > > > run.
> > > > > > > >> > This
> > > > > > > >> > > is only an issue when running CONUS verification
due to
> > the
> > > > > amount
> > > > > > > of
> > > > > > > >> > masks
> > > > > > > >> > > we have to run (all CWAs, Regions, RFCs, CONUS).
It
> turns
> > > out
> > > > to
> > > > > > be
> > > > > > > >> 133
> > > > > > > >> > max
> > > > > > > >> > > masks. It appears that MET takes all of the masks
and
> > stores
> > > > > them
> > > > > > in
> > > > > > > >> > memory
> > > > > > > >> > > for the duration of the computation, is that
correct? If
> > so:
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > Example CONUS NDFD verification run:
> > > > > > > >> > >
> > > > > > > >> > > 1 lead time verification = 133 masks at ~35MB
each ~
> 4.5GB
> > > RAM
> > > > > per
> > > > > > > >> lead
> > > > > > > >> > > time
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > The issue is that our machine only has 16GB RAM
> available,
> > > so
> > > > I
> > > > > > can
> > > > > > > >> only
> > > > > > > >> > > run 3 threads at a time with this current
configuration.
> > > With
> > > > > > > >> > verification
> > > > > > > >> > > run-time at approximately 7 minutes per lead
time,
> > real-time
> > > > > > > >> verification
> > > > > > > >> > > processing time is going to soar if I can't run
more
> > threads
> > > > at
> > > > > a
> > > > > > > >> time.
> > > > > > > >> > >
> > > > > > > >> > > We have 16 core available and I'm trying to use
> > everything I
> > > > can
> > > > > > to
> > > > > > > >> speed
> > > > > > > >> > > this up in preparations for expanding this to
other
> > weather
> > > > > > elements
> > > > > > > >> in
> > > > > > > >> > the
> > > > > > > >> > > future when we centralize and modernize MDL's
> verification
> > > > > > efforts.
> > > > > > > >> > >
> > > > > > > >> > > Is there anything we can do to optimize? Sharing
mask
> RAM
> > > > across
> > > > > > > >> threads?
> > > > > > > >> > > Compress the masks? We could also run threads
with less
> > > masks,
> > > > > but
> > > > > > > >> that
> > > > > > > >> > > doesn't seem like it would help too much in terms
of
> total
> > > > time.
> > > > > > > >> > >
> > > > > > > >> > > Any thoughts would be super helpful!
> > > > > > > >> > >
> > > > > > > >> > > Thank you,
> > > > > > > >> > > Dana
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > > Dana Strom
> > > > > > > >> > >
> > > > > > > >> > > Technical Lead, AceInfo Solutions
> > > > > > > >> > >
> > > > > > > >> > > Meteorological Development Laboratory (NWS)
> > > > > > > >> > >
> > > > > > > >> > > Phone: 301-427-9451
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Dana Strom
> > > > > > > >>
> > > > > > > >> Technical Lead, AceInfo Solutions
> > > > > > > >>
> > > > > > > >> Meteorological Development Laboratory (NWS)
> > > > > > > >>
> > > > > > > >> Phone: 301-427-9451
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Dana Strom
> > > > > >
> > > > > > Technical Lead, AceInfo Solutions
> > > > > >
> > > > > > Meteorological Development Laboratory (NWS)
> > > > > >
> > > > > > Phone: 301-427-9451
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Dana Strom
> > > >
> > > > Technical Lead, AceInfo Solutions
> > > >
> > > > Meteorological Development Laboratory (NWS)
> > > >
> > > > Phone: 301-427-9451
> > > >
> > > >
> > >
> > >
> >
> >
> > --
> > Dana Strom
> >
> > Technical Lead, AceInfo Solutions
> >
> > Meteorological Development Laboratory (NWS)
> >
> > Phone: 301-427-9451
> >
> >
>
>
--
Dana Strom
Technical Lead, AceInfo Solutions
Meteorological Development Laboratory (NWS)
Phone: 301-427-9451
------------------------------------------------
More information about the Met_help
mailing list