[Met_help] [rt.rap.ucar.edu #64485] History for Stat Analysis efficiency and grouping issues

John Halley Gotway via RT met_help at ucar.edu
Mon Dec 9 10:41:59 MST 2013


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Good morning,

I am attempting to run Stat Analysis to aggregate hourly Point Stat files into CNT format output files.  I am looking to produce aggregate-stat data in CNT format for four atmospheric variables (TMP, DPT, RH, and SPFH) at four height levels (Z2, P850, P700 and P500) across multiple masks, and I would like to get one line of CNT data for each variable, height level and masking region.  I am trying to get this data on a temporal scale of 24 hours, and I would eventually like to run this for monthly and annual time scales as well.  However, Stat Analysis runs extremely slow or simply appears to freeze even when trying to produce a daily CNT file, and I cannot get more than one line of data in the resulting "out" file, which leads me to believe that Stat Analysis is grouping all of the above conditions into one line of output, instead of one line for each variable/height level/mask combination as I would like.  I've experimented with command line instructions based on the online!
  tutorial, using SL1L2 data instead of MPR data, but they do not appear to be any quicker, nor are they solving this grouping issue I've describe above.

I would greatly appreciate any help on getting the data that I'm looking for, if I am preparing the config scripts correctly, and how long it should take Stat Analysis to produce this data.  I have ftp'd 6 hours of .stat files from Point Stat, as well as the two scripts that I am using to configure and run Stat Analysis for your review.

Thank you for your time,

Elliot Tardif

Elliot Tardif, Meteorologist II
NC DENR, Division of Air Quality
Planning Section, Attainment Planning Branch
1641 Mail Service Center
Raleigh, NC 27699-1641
Phone/Fax:  919-707-8483
Email:  Elliot.Tardif at ncdenr.gov<mailto:Nick.Witcraft at ncdenr.gov>
Web  :  http://www.ncair.org<http://www.ncair.org/>

Email correspondence to and from this address is subject to the North Carolina Public Records Law and may be disclosed to third parties unless the content is exempt by statue or other regulation.



----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: Stat Analysis efficiency and grouping issues
From: John Halley Gotway
Time: Wed Dec 04 14:52:03 2013

Elliot,

Thanks for sending the sample data.  I ran STAT-Analysis on that data
using the config file you sent, and indeed, it produces a single line
of output.  Your config file contains the following setting:
    vx_mask = ["SEMAP", "AL", "FL", "GA", "KY", "MS", "NC", "SC",
"TN", "VA", "WV"];

Here's a selection from the file "METv4.1/data/config/README":
   //    Job command FILTERING options to further refine the STAT
data:
   //       Each optional argument may be used in the job
specification multiple
   //       times unless otherwise indicated. When multiple optional
arguments of
   //       the same type are indicated, the analysis will be
performed over their
   //       union

So when you list multiple options, the job will run over their union.
Instead of defining a single job, you could define multiple jobs - one
for each vx_mask of interest.  For example...
   jobs = [
      "-job aggregate_stat -out_line_type CNT -vx_mask SEMAP",
      "-job aggregate_stat -out_line_type CNT -vx_mask AL",
      "-job aggregate_stat -out_line_type CNT -vx_mask FL",
   ... and so on
   ];

However, we've added a new option in METv4.1 to facilitate this sort
of thing.  You can specify case information for each job using the "-
by" option.  STAT-Analysis will look at the contents of the
column(s) you've specified and will run that same job once for each
unique case value it finds.  It's probably easier to just look at an
example.  Edit the config file you sent to me like this:

   jobs = [
      "-job aggregate_stat -out_line_type CNT -by vx_mask"
   ];

I've attached the resulting output file (stat_analysis.out).  You have
one CNT output line for each unique entry in the vx_mask column.
Notice that I've kept the "vx_mask = [ ... ];" setting at the
top.  This restricts the lines that will be considered by STAT-
Analysis, but if you want to compute a CNT line for all of the vx_mask
values, you could just get rid of that setting at the top.

Next, suppose you want to look at continuous stats by masking region,
forecast variable, and forecast level.  You'd just add those to the
"case" information like this:
   -by vx_mask -by fcst_var -by fcst_lev.
Then STAT-Analysis will run the same job for each unique combination
of "vx_mask:fcst_var:fcst_lev".  Make sense?

Hopefully that'll help get you going.  As for how it's running on your
machine, it really all depends on the amount of data you're passing to
STAT-Analysis and the size of your machine.  Right now, it
sounds like you want to run STAT-Analysis to compute daily stats.
Most of STAT-Analysis's time is spent just reading and parsing the
.stat files you pass to it.  In your script, you're telling
STAT-Analysis to look in this directory:
    -lookin /opt/storage/high-
speed/EMT2/METv4.1/METv4.1/out/point_stat/stats_files

I'd suggest limiting the amount of stat files you pass to it as much
as possible.  For example, if you want to compute daily stats, only
pass that day's worth of stat files to it.  You might consider
organizing your stat files by date, like this:
    /opt/storage/high-
speed/EMT2/METv4.1/METv4.1/out/point_stat/stats_files/YYYMMDD

When you call STAT-Analysis, you could just pass it that single date
directory.  That less data you pass it to parse, the quicker it will
run.

Thanks,
John Halley Gotway
met_help at ucar.edu


On 12/04/2013 09:17 AM, Tardif, Elliot M via RT wrote:
>
> Wed Dec 04 09:17:49 2013: Request 64485 was acted upon.
> Transaction: Ticket created by elliot.tardif at ncdenr.gov
>         Queue: met_help
>       Subject: Stat Analysis efficiency and grouping issues
>         Owner: Nobody
>    Requestors: elliot.tardif at ncdenr.gov
>        Status: new
>   Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=64485 >
>
>
> Good morning,
>
> I am attempting to run Stat Analysis to aggregate hourly Point Stat
files into CNT format output files.  I am looking to produce
aggregate-stat data in CNT format for four atmospheric variables (TMP,
DPT, RH, and SPFH) at four height levels (Z2, P850, P700 and P500)
across multiple masks, and I would like to get one line of CNT data
for each variable, height level and masking region.  I am trying to
get this data on a temporal scale of 24 hours, and I would eventually
like to run this for monthly and annual time scales as well.  However,
Stat Analysis runs extremely slow or simply appears to freeze even
when trying to produce a daily CNT file, and I cannot get more than
one line of data in the resulting "out" file, which leads me to
believe that Stat Analysis is grouping all of the above conditions
into one line of output, instead of one line for each variable/height
level/mask combination as I would like.  I've experimented with
command line instructions based on the onli!
 ne!
>    tutorial, using SL1L2 data instead of MPR data, but they do not
appear to be any quicker, nor are they solving this grouping issue
I've describe above.
>
> I would greatly appreciate any help on getting the data that I'm
looking for, if I am preparing the config scripts correctly, and how
long it should take Stat Analysis to produce this data.  I have ftp'd
6 hours of .stat files from Point Stat, as well as the two scripts
that I am using to configure and run Stat Analysis for your review.
>
> Thank you for your time,
>
> Elliot Tardif
>
> Elliot Tardif, Meteorologist II
> NC DENR, Division of Air Quality
> Planning Section, Attainment Planning Branch
> 1641 Mail Service Center
> Raleigh, NC 27699-1641
> Phone/Fax:  919-707-8483
> Email:  Elliot.Tardif at ncdenr.gov<mailto:Nick.Witcraft at ncdenr.gov>
> Web  :  http://www.ncair.org<http://www.ncair.org/>
>
> Email correspondence to and from this address is subject to the
North Carolina Public Records Law and may be disclosed to third
parties unless the content is exempt by statue or other regulation.
>

------------------------------------------------
Subject: Stat Analysis efficiency and grouping issues
From: John Halley Gotway
Time: Wed Dec 04 14:52:03 2013

JOB_LIST:      -job aggregate_stat -fcst_valid_beg 20110101_000000
-fcst_valid_end 20110101_060000 -obs_valid_beg 20110101_000000
-obs_valid_end 20110101_060000 -fcst_var TMP -obs_var TMP -fcst_lev Z2
-obs_lev Z2 -vx_mask SEMAP -vx_mask AL -vx_mask FL -vx_mask GA
-vx_mask KY -vx_mask MS -vx_mask NC -vx_mask SC -vx_mask TN -vx_mask
VA -vx_mask WV -line_type SL1L2 -by VX_MASK -out_line_type CNT
-out_alpha 0.05 -rank_corr_flag 1
COL_NAME: VX_MASK TOTAL FBAR      FBAR_NCL  FBAR_NCU  FBAR_BCL
FBAR_BCU FSTDEV  FSTDEV_NCL FSTDEV_NCU FSTDEV_BCL FSTDEV_BCU OBAR
OBAR_NCL  OBAR_NCU  OBAR_BCL OBAR_BCU OSTDEV  OSTDEV_NCL OSTDEV_NCU
OSTDEV_BCL OSTDEV_BCU PR_CORR PR_CORR_NCL PR_CORR_NCU PR_CORR_BCL
PR_CORR_BCU SP_CORR KT_CORR RANKS FRANK_TIES ORANK_TIES ME
ME_NCL   ME_NCU   ME_BCL ME_BCU ESTDEV  ESTDEV_NCL ESTDEV_NCU
ESTDEV_BCL ESTDEV_BCU MBIAS   MBIAS_BCL MBIAS_BCU MAE MAE_BCL MAE_BCU
MSE      MSE_BCL MSE_BCU BCMSE    BCMSE_BCL BCMSE_BCU RMSE    RMSE_BCL
RMSE_BCU E10 E10_BCL E10_BCU E25 E25_BCL E25_BCU E50 E50_BCL E50_BCU
E75 E75_BCL E75_BCU E90 E90_BCL E90_BCU
     CNT: AL      273   288.18901 287.98014 288.39789 NA       NA
1.76085 1.62449    1.92239    NA         NA         290.61996
290.41890 290.82103 NA       NA       1.69499 1.56373    1.85050    NA
NA         0.44771 0.34746     0.53784     NA          NA          NA
NA      0     0          0          -2.43095 -2.64647 -2.21543 NA
NA     1.81689 1.67619    1.98358    NA         NA         0.99164 NA
NA        NA  NA      NA      9.19851  NA      NA      3.28899  NA
NA        3.03290 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: FL      713   289.78342 289.55623 290.01061 NA       NA
3.09519 2.94244    3.26478    NA         NA         290.86164
290.66483 291.05845 NA       NA       2.68132 2.54900    2.82823    NA
NA         0.77477 0.74366     0.80254     NA          NA          NA
NA      0     0          0          -1.07822 -1.22336 -0.93309 NA
NA     1.97728 1.87971    2.08563    NA         NA         0.99629 NA
NA        NA  NA      NA      5.06673  NA      NA      3.90417  NA
NA        2.25094 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: GA      319   284.68134 284.52080 284.84187 NA       NA
1.46287 1.35748    1.58615    NA         NA         287.56944
287.32554 287.81334 NA       NA       2.22259 2.06246    2.40990    NA
NA         0.48018 0.39098     0.56044     NA          NA          NA
NA      0     0          0          -2.88810 -3.10640 -2.66980 NA
NA     1.98933 1.84601    2.15698    NA         NA         0.98996 NA
NA        NA  NA      NA      12.28617 NA      NA      3.94504  NA
NA        3.50516 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: KY      159   285.02011 284.69237 285.34785 NA       NA
2.10853 1.89946    2.36977    NA         NA         288.56635
288.27634 288.85636 NA       NA       1.86579 1.68078    2.09695    NA
NA         0.54480 0.42521     0.64569     NA          NA          NA
NA      0     0          0          -3.54624 -3.84282 -3.24967 NA
NA     1.90802 1.71882    2.14441    NA         NA         0.98771 NA
NA        NA  NA      NA      16.19349 NA      NA      3.61764  NA
NA        4.02411 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: MS      189   289.93735 289.82411 290.05059 NA       NA
0.79429 0.72148    0.88360    NA         NA         291.19973
290.93044 291.46903 NA       NA       1.88889 1.71572    2.10126    NA
NA         0.31922 0.18492     0.44182     NA          NA          NA
NA      0     0          0          -1.26239 -1.51904 -1.00573 NA
NA     1.80026 1.63522    2.00266    NA         NA         0.99566 NA
NA        NA  NA      NA      4.81741  NA      NA      3.22378  NA
NA        2.19486 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: NC      494   278.32981 278.07095 278.58867 NA       NA
2.93549 2.76315    3.13095    NA         NA         278.99554
278.78731 279.20378 NA       NA       2.36135 2.22272    2.51858    NA
NA         0.43233 0.35776     0.50143     NA          NA          NA
NA      0     0          0          -0.66573 -0.91824 -0.41322 NA
NA     2.86347 2.69535    3.05413    NA         NA         0.99761 NA
NA        NA  NA      NA      8.62606  NA      NA      8.18286  NA
NA        2.93701 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: SC      278   281.78971 281.39632 282.18310 NA       NA
3.34654 3.08955    3.65054    NA         NA         283.13093
282.78956 283.47231 NA       NA       2.90406 2.68105    3.16786    NA
NA         0.51101 0.41853     0.59300     NA          NA          NA
NA      0     0          0          -1.34122 -1.70734 -0.97511 NA
NA     3.11453 2.87536    3.39745    NA         NA         0.99526 NA
NA        NA  NA      NA      11.46430 NA      NA      9.66542  NA
NA        3.38590 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: SEMAP   2329  285.37747 285.16334 285.59161 NA       NA
5.27256 5.12538    5.42851    NA         NA         286.98710
286.77117 287.20302 NA       NA       5.31662 5.16821    5.47387    NA
NA         0.89376 0.88528     0.90164     NA          NA          NA
NA      0     0          0          -1.60962 -1.70876 -1.51049 NA
NA     2.44099 2.37285    2.51319    NA         NA         0.99439 NA
NA        NA  NA      NA      8.54677  NA      NA      5.95588  NA
NA        2.92349 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: TN      141   285.76966 285.24729 286.29204 NA       NA
3.16479 2.83351    3.58453    NA         NA         288.01525
287.51277 288.51773 NA       NA       3.04424 2.72558    3.44800    NA
NA         0.80640 0.73970     0.85742     NA          NA          NA
NA      0     0          0          -2.24559 -2.56500 -1.92617 NA
NA     1.93517 1.73260    2.19183    NA         NA         0.99220 NA
NA        NA  NA      NA      8.76097  NA      NA      3.71832  NA
NA        2.95989 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: VA      461   276.55925 276.42432 276.69419 NA       NA
1.47820 1.38854    1.58033    NA         NA         277.74566
277.47994 278.01137 NA       NA       2.91085 2.73430    3.11196    NA
NA         0.39013 0.30985     0.46490     NA          NA          NA
NA      0     0          0          -1.18641 -1.43306 -0.93976 NA
NA     2.70200 2.53811    2.88868    NA         NA         0.99573 NA
NA        NA  NA      NA      8.69250  NA      NA      7.28494  NA
NA        2.94830 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA
     CNT: WV      120   278.39675 277.99275 278.80074 NA       NA
2.25795 2.00389    2.58641    NA         NA         280.26500
279.36629 281.16371 NA       NA       5.02297 4.45779    5.75365    NA
NA         0.73199 0.63622     0.80554     NA          NA          NA
NA      0     0          0          -1.86826 -2.53110 -1.20541 NA
NA     3.70469 3.28785    4.24361    NA         NA         0.99333 NA
NA        NA  NA      NA      17.10074 NA      NA      13.61036 NA
NA        4.13530 NA       NA       NA  NA      NA      NA  NA      NA
NA  NA      NA      NA  NA      NA      NA  NA      NA


------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #64485] Stat Analysis efficiency and grouping issues
From: Tardif, Elliot M
Time: Fri Dec 06 11:10:55 2013

Good afternoon John,

I'm happy to report that after reading your reply and doing a little
exploration of my own, that I believe I've solved the issues I
described earlier.

Thanks again for your help,

Elliot Tardif, Meteorologist II
NC DENR, Division of Air Quality
Planning Section, Attainment Planning Branch
1641 Mail Service Center
Raleigh, NC 27699-1641
Phone/Fax:  919-707-8483
Email:  Elliot.Tardif at ncdenr.gov
Web  :  http://www.ncair.org

Email correspondence to and from this address is subject to the North
Carolina Public Records Law and may be disclosed to third parties
unless the content is exempt by statue or other regulation.



-----Original Message-----
From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
Sent: Wednesday, December 04, 2013 4:52 PM
To: Tardif, Elliot M
Subject: Re: [rt.rap.ucar.edu #64485] Stat Analysis efficiency and
grouping issues

Elliot,

Thanks for sending the sample data.  I ran STAT-Analysis on that data
using the config file you sent, and indeed, it produces a single line
of output.  Your config file contains the following setting:
    vx_mask = ["SEMAP", "AL", "FL", "GA", "KY", "MS", "NC", "SC",
"TN", "VA", "WV"];

Here's a selection from the file "METv4.1/data/config/README":
   //    Job command FILTERING options to further refine the STAT
data:
   //       Each optional argument may be used in the job
specification multiple
   //       times unless otherwise indicated. When multiple optional
arguments of
   //       the same type are indicated, the analysis will be
performed over their
   //       union

So when you list multiple options, the job will run over their union.
Instead of defining a single job, you could define multiple jobs - one
for each vx_mask of interest.  For example...
   jobs = [
      "-job aggregate_stat -out_line_type CNT -vx_mask SEMAP",
      "-job aggregate_stat -out_line_type CNT -vx_mask AL",
      "-job aggregate_stat -out_line_type CNT -vx_mask FL",
   ... and so on
   ];

However, we've added a new option in METv4.1 to facilitate this sort
of thing.  You can specify case information for each job using the "-
by" option.  STAT-Analysis will look at the contents of the
column(s) you've specified and will run that same job once for each
unique case value it finds.  It's probably easier to just look at an
example.  Edit the config file you sent to me like this:

   jobs = [
      "-job aggregate_stat -out_line_type CNT -by vx_mask"
   ];

I've attached the resulting output file (stat_analysis.out).  You have
one CNT output line for each unique entry in the vx_mask column.
Notice that I've kept the "vx_mask = [ ... ];" setting at the top.
This restricts the lines that will be considered by STAT-Analysis, but
if you want to compute a CNT line for all of the vx_mask values, you
could just get rid of that setting at the top.

Next, suppose you want to look at continuous stats by masking region,
forecast variable, and forecast level.  You'd just add those to the
"case" information like this:
   -by vx_mask -by fcst_var -by fcst_lev.
Then STAT-Analysis will run the same job for each unique combination
of "vx_mask:fcst_var:fcst_lev".  Make sense?

Hopefully that'll help get you going.  As for how it's running on your
machine, it really all depends on the amount of data you're passing to
STAT-Analysis and the size of your machine.  Right now, it sounds like
you want to run STAT-Analysis to compute daily stats.  Most of STAT-
Analysis's time is spent just reading and parsing the .stat files you
pass to it.  In your script, you're telling STAT-Analysis to look in
this directory:
    -lookin /opt/storage/high-
speed/EMT2/METv4.1/METv4.1/out/point_stat/stats_files

I'd suggest limiting the amount of stat files you pass to it as much
as possible.  For example, if you want to compute daily stats, only
pass that day's worth of stat files to it.  You might consider
organizing your stat files by date, like this:
    /opt/storage/high-
speed/EMT2/METv4.1/METv4.1/out/point_stat/stats_files/YYYMMDD

When you call STAT-Analysis, you could just pass it that single date
directory.  That less data you pass it to parse, the quicker it will
run.

Thanks,
John Halley Gotway
met_help at ucar.edu


On 12/04/2013 09:17 AM, Tardif, Elliot M via RT wrote:
>
> Wed Dec 04 09:17:49 2013: Request 64485 was acted upon.
> Transaction: Ticket created by elliot.tardif at ncdenr.gov
>         Queue: met_help
>       Subject: Stat Analysis efficiency and grouping issues
>         Owner: Nobody
>    Requestors: elliot.tardif at ncdenr.gov
>        Status: new
>   Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=64485
> >
>
>
> Good morning,
>
> I am attempting to run Stat Analysis to aggregate hourly Point Stat
files into CNT format output files.  I am looking to produce
aggregate-stat data in CNT format for four atmospheric variables (TMP,
DPT, RH, and SPFH) at four height levels (Z2, P850, P700 and P500)
across multiple masks, and I would like to get one line of CNT data
for each variable, height level and masking region.  I am trying to
get this data on a temporal scale of 24 hours, and I would eventually
like to run this for monthly and annual time scales as well.  However,
Stat Analysis runs extremely slow or simply appears to freeze even
when trying to produce a daily CNT file, and I cannot get more than
one line of data in the resulting "out" file, which leads me to
believe that Stat Analysis is grouping all of the above conditions
into one line of output, instead of one line for each variable/height
level/mask combination as I would like.  I've experimented with
command line instructions based on the onli!
 ne!
>    tutorial, using SL1L2 data instead of MPR data, but they do not
appear to be any quicker, nor are they solving this grouping issue
I've describe above.
>
> I would greatly appreciate any help on getting the data that I'm
looking for, if I am preparing the config scripts correctly, and how
long it should take Stat Analysis to produce this data.  I have ftp'd
6 hours of .stat files from Point Stat, as well as the two scripts
that I am using to configure and run Stat Analysis for your review.
>
> Thank you for your time,
>
> Elliot Tardif
>
> Elliot Tardif, Meteorologist II
> NC DENR, Division of Air Quality
> Planning Section, Attainment Planning Branch
> 1641 Mail Service Center
> Raleigh, NC 27699-1641
> Phone/Fax:  919-707-8483
> Email:  Elliot.Tardif at ncdenr.gov<mailto:Nick.Witcraft at ncdenr.gov>
> Web  :  http://www.ncair.org<http://www.ncair.org/>
>
> Email correspondence to and from this address is subject to the
North Carolina Public Records Law and may be disclosed to third
parties unless the content is exempt by statue or other regulation.
>



------------------------------------------------


More information about the Met_help mailing list