[Met_help] [rt.rap.ucar.edu #64485] History for Stat Analysis efficiency and grouping issues
John Halley Gotway via RT
met_help at ucar.edu
Mon Dec 9 10:41:59 MST 2013
----------------------------------------------------------------
Initial Request
----------------------------------------------------------------
Good morning,
I am attempting to run Stat Analysis to aggregate hourly Point Stat files into CNT format output files. I am looking to produce aggregate-stat data in CNT format for four atmospheric variables (TMP, DPT, RH, and SPFH) at four height levels (Z2, P850, P700 and P500) across multiple masks, and I would like to get one line of CNT data for each variable, height level and masking region. I am trying to get this data on a temporal scale of 24 hours, and I would eventually like to run this for monthly and annual time scales as well. However, Stat Analysis runs extremely slow or simply appears to freeze even when trying to produce a daily CNT file, and I cannot get more than one line of data in the resulting "out" file, which leads me to believe that Stat Analysis is grouping all of the above conditions into one line of output, instead of one line for each variable/height level/mask combination as I would like. I've experimented with command line instructions based on the online!
tutorial, using SL1L2 data instead of MPR data, but they do not appear to be any quicker, nor are they solving this grouping issue I've describe above.
I would greatly appreciate any help on getting the data that I'm looking for, if I am preparing the config scripts correctly, and how long it should take Stat Analysis to produce this data. I have ftp'd 6 hours of .stat files from Point Stat, as well as the two scripts that I am using to configure and run Stat Analysis for your review.
Thank you for your time,
Elliot Tardif
Elliot Tardif, Meteorologist II
NC DENR, Division of Air Quality
Planning Section, Attainment Planning Branch
1641 Mail Service Center
Raleigh, NC 27699-1641
Phone/Fax: 919-707-8483
Email: Elliot.Tardif at ncdenr.gov<mailto:Nick.Witcraft at ncdenr.gov>
Web : http://www.ncair.org<http://www.ncair.org/>
Email correspondence to and from this address is subject to the North Carolina Public Records Law and may be disclosed to third parties unless the content is exempt by statue or other regulation.
----------------------------------------------------------------
Complete Ticket History
----------------------------------------------------------------
Subject: Stat Analysis efficiency and grouping issues
From: John Halley Gotway
Time: Wed Dec 04 14:52:03 2013
Elliot,
Thanks for sending the sample data. I ran STAT-Analysis on that data
using the config file you sent, and indeed, it produces a single line
of output. Your config file contains the following setting:
vx_mask = ["SEMAP", "AL", "FL", "GA", "KY", "MS", "NC", "SC",
"TN", "VA", "WV"];
Here's a selection from the file "METv4.1/data/config/README":
// Job command FILTERING options to further refine the STAT
data:
// Each optional argument may be used in the job
specification multiple
// times unless otherwise indicated. When multiple optional
arguments of
// the same type are indicated, the analysis will be
performed over their
// union
So when you list multiple options, the job will run over their union.
Instead of defining a single job, you could define multiple jobs - one
for each vx_mask of interest. For example...
jobs = [
"-job aggregate_stat -out_line_type CNT -vx_mask SEMAP",
"-job aggregate_stat -out_line_type CNT -vx_mask AL",
"-job aggregate_stat -out_line_type CNT -vx_mask FL",
... and so on
];
However, we've added a new option in METv4.1 to facilitate this sort
of thing. You can specify case information for each job using the "-
by" option. STAT-Analysis will look at the contents of the
column(s) you've specified and will run that same job once for each
unique case value it finds. It's probably easier to just look at an
example. Edit the config file you sent to me like this:
jobs = [
"-job aggregate_stat -out_line_type CNT -by vx_mask"
];
I've attached the resulting output file (stat_analysis.out). You have
one CNT output line for each unique entry in the vx_mask column.
Notice that I've kept the "vx_mask = [ ... ];" setting at the
top. This restricts the lines that will be considered by STAT-
Analysis, but if you want to compute a CNT line for all of the vx_mask
values, you could just get rid of that setting at the top.
Next, suppose you want to look at continuous stats by masking region,
forecast variable, and forecast level. You'd just add those to the
"case" information like this:
-by vx_mask -by fcst_var -by fcst_lev.
Then STAT-Analysis will run the same job for each unique combination
of "vx_mask:fcst_var:fcst_lev". Make sense?
Hopefully that'll help get you going. As for how it's running on your
machine, it really all depends on the amount of data you're passing to
STAT-Analysis and the size of your machine. Right now, it
sounds like you want to run STAT-Analysis to compute daily stats.
Most of STAT-Analysis's time is spent just reading and parsing the
.stat files you pass to it. In your script, you're telling
STAT-Analysis to look in this directory:
-lookin /opt/storage/high-
speed/EMT2/METv4.1/METv4.1/out/point_stat/stats_files
I'd suggest limiting the amount of stat files you pass to it as much
as possible. For example, if you want to compute daily stats, only
pass that day's worth of stat files to it. You might consider
organizing your stat files by date, like this:
/opt/storage/high-
speed/EMT2/METv4.1/METv4.1/out/point_stat/stats_files/YYYMMDD
When you call STAT-Analysis, you could just pass it that single date
directory. That less data you pass it to parse, the quicker it will
run.
Thanks,
John Halley Gotway
met_help at ucar.edu
On 12/04/2013 09:17 AM, Tardif, Elliot M via RT wrote:
>
> Wed Dec 04 09:17:49 2013: Request 64485 was acted upon.
> Transaction: Ticket created by elliot.tardif at ncdenr.gov
> Queue: met_help
> Subject: Stat Analysis efficiency and grouping issues
> Owner: Nobody
> Requestors: elliot.tardif at ncdenr.gov
> Status: new
> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=64485 >
>
>
> Good morning,
>
> I am attempting to run Stat Analysis to aggregate hourly Point Stat
files into CNT format output files. I am looking to produce
aggregate-stat data in CNT format for four atmospheric variables (TMP,
DPT, RH, and SPFH) at four height levels (Z2, P850, P700 and P500)
across multiple masks, and I would like to get one line of CNT data
for each variable, height level and masking region. I am trying to
get this data on a temporal scale of 24 hours, and I would eventually
like to run this for monthly and annual time scales as well. However,
Stat Analysis runs extremely slow or simply appears to freeze even
when trying to produce a daily CNT file, and I cannot get more than
one line of data in the resulting "out" file, which leads me to
believe that Stat Analysis is grouping all of the above conditions
into one line of output, instead of one line for each variable/height
level/mask combination as I would like. I've experimented with
command line instructions based on the onli!
ne!
> tutorial, using SL1L2 data instead of MPR data, but they do not
appear to be any quicker, nor are they solving this grouping issue
I've describe above.
>
> I would greatly appreciate any help on getting the data that I'm
looking for, if I am preparing the config scripts correctly, and how
long it should take Stat Analysis to produce this data. I have ftp'd
6 hours of .stat files from Point Stat, as well as the two scripts
that I am using to configure and run Stat Analysis for your review.
>
> Thank you for your time,
>
> Elliot Tardif
>
> Elliot Tardif, Meteorologist II
> NC DENR, Division of Air Quality
> Planning Section, Attainment Planning Branch
> 1641 Mail Service Center
> Raleigh, NC 27699-1641
> Phone/Fax: 919-707-8483
> Email: Elliot.Tardif at ncdenr.gov<mailto:Nick.Witcraft at ncdenr.gov>
> Web : http://www.ncair.org<http://www.ncair.org/>
>
> Email correspondence to and from this address is subject to the
North Carolina Public Records Law and may be disclosed to third
parties unless the content is exempt by statue or other regulation.
>
------------------------------------------------
Subject: Stat Analysis efficiency and grouping issues
From: John Halley Gotway
Time: Wed Dec 04 14:52:03 2013
JOB_LIST: -job aggregate_stat -fcst_valid_beg 20110101_000000
-fcst_valid_end 20110101_060000 -obs_valid_beg 20110101_000000
-obs_valid_end 20110101_060000 -fcst_var TMP -obs_var TMP -fcst_lev Z2
-obs_lev Z2 -vx_mask SEMAP -vx_mask AL -vx_mask FL -vx_mask GA
-vx_mask KY -vx_mask MS -vx_mask NC -vx_mask SC -vx_mask TN -vx_mask
VA -vx_mask WV -line_type SL1L2 -by VX_MASK -out_line_type CNT
-out_alpha 0.05 -rank_corr_flag 1
COL_NAME: VX_MASK TOTAL FBAR FBAR_NCL FBAR_NCU FBAR_BCL
FBAR_BCU FSTDEV FSTDEV_NCL FSTDEV_NCU FSTDEV_BCL FSTDEV_BCU OBAR
OBAR_NCL OBAR_NCU OBAR_BCL OBAR_BCU OSTDEV OSTDEV_NCL OSTDEV_NCU
OSTDEV_BCL OSTDEV_BCU PR_CORR PR_CORR_NCL PR_CORR_NCU PR_CORR_BCL
PR_CORR_BCU SP_CORR KT_CORR RANKS FRANK_TIES ORANK_TIES ME
ME_NCL ME_NCU ME_BCL ME_BCU ESTDEV ESTDEV_NCL ESTDEV_NCU
ESTDEV_BCL ESTDEV_BCU MBIAS MBIAS_BCL MBIAS_BCU MAE MAE_BCL MAE_BCU
MSE MSE_BCL MSE_BCU BCMSE BCMSE_BCL BCMSE_BCU RMSE RMSE_BCL
RMSE_BCU E10 E10_BCL E10_BCU E25 E25_BCL E25_BCU E50 E50_BCL E50_BCU
E75 E75_BCL E75_BCU E90 E90_BCL E90_BCU
CNT: AL 273 288.18901 287.98014 288.39789 NA NA
1.76085 1.62449 1.92239 NA NA 290.61996
290.41890 290.82103 NA NA 1.69499 1.56373 1.85050 NA
NA 0.44771 0.34746 0.53784 NA NA NA
NA 0 0 0 -2.43095 -2.64647 -2.21543 NA
NA 1.81689 1.67619 1.98358 NA NA 0.99164 NA
NA NA NA NA 9.19851 NA NA 3.28899 NA
NA 3.03290 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: FL 713 289.78342 289.55623 290.01061 NA NA
3.09519 2.94244 3.26478 NA NA 290.86164
290.66483 291.05845 NA NA 2.68132 2.54900 2.82823 NA
NA 0.77477 0.74366 0.80254 NA NA NA
NA 0 0 0 -1.07822 -1.22336 -0.93309 NA
NA 1.97728 1.87971 2.08563 NA NA 0.99629 NA
NA NA NA NA 5.06673 NA NA 3.90417 NA
NA 2.25094 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: GA 319 284.68134 284.52080 284.84187 NA NA
1.46287 1.35748 1.58615 NA NA 287.56944
287.32554 287.81334 NA NA 2.22259 2.06246 2.40990 NA
NA 0.48018 0.39098 0.56044 NA NA NA
NA 0 0 0 -2.88810 -3.10640 -2.66980 NA
NA 1.98933 1.84601 2.15698 NA NA 0.98996 NA
NA NA NA NA 12.28617 NA NA 3.94504 NA
NA 3.50516 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: KY 159 285.02011 284.69237 285.34785 NA NA
2.10853 1.89946 2.36977 NA NA 288.56635
288.27634 288.85636 NA NA 1.86579 1.68078 2.09695 NA
NA 0.54480 0.42521 0.64569 NA NA NA
NA 0 0 0 -3.54624 -3.84282 -3.24967 NA
NA 1.90802 1.71882 2.14441 NA NA 0.98771 NA
NA NA NA NA 16.19349 NA NA 3.61764 NA
NA 4.02411 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: MS 189 289.93735 289.82411 290.05059 NA NA
0.79429 0.72148 0.88360 NA NA 291.19973
290.93044 291.46903 NA NA 1.88889 1.71572 2.10126 NA
NA 0.31922 0.18492 0.44182 NA NA NA
NA 0 0 0 -1.26239 -1.51904 -1.00573 NA
NA 1.80026 1.63522 2.00266 NA NA 0.99566 NA
NA NA NA NA 4.81741 NA NA 3.22378 NA
NA 2.19486 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: NC 494 278.32981 278.07095 278.58867 NA NA
2.93549 2.76315 3.13095 NA NA 278.99554
278.78731 279.20378 NA NA 2.36135 2.22272 2.51858 NA
NA 0.43233 0.35776 0.50143 NA NA NA
NA 0 0 0 -0.66573 -0.91824 -0.41322 NA
NA 2.86347 2.69535 3.05413 NA NA 0.99761 NA
NA NA NA NA 8.62606 NA NA 8.18286 NA
NA 2.93701 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: SC 278 281.78971 281.39632 282.18310 NA NA
3.34654 3.08955 3.65054 NA NA 283.13093
282.78956 283.47231 NA NA 2.90406 2.68105 3.16786 NA
NA 0.51101 0.41853 0.59300 NA NA NA
NA 0 0 0 -1.34122 -1.70734 -0.97511 NA
NA 3.11453 2.87536 3.39745 NA NA 0.99526 NA
NA NA NA NA 11.46430 NA NA 9.66542 NA
NA 3.38590 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: SEMAP 2329 285.37747 285.16334 285.59161 NA NA
5.27256 5.12538 5.42851 NA NA 286.98710
286.77117 287.20302 NA NA 5.31662 5.16821 5.47387 NA
NA 0.89376 0.88528 0.90164 NA NA NA
NA 0 0 0 -1.60962 -1.70876 -1.51049 NA
NA 2.44099 2.37285 2.51319 NA NA 0.99439 NA
NA NA NA NA 8.54677 NA NA 5.95588 NA
NA 2.92349 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: TN 141 285.76966 285.24729 286.29204 NA NA
3.16479 2.83351 3.58453 NA NA 288.01525
287.51277 288.51773 NA NA 3.04424 2.72558 3.44800 NA
NA 0.80640 0.73970 0.85742 NA NA NA
NA 0 0 0 -2.24559 -2.56500 -1.92617 NA
NA 1.93517 1.73260 2.19183 NA NA 0.99220 NA
NA NA NA NA 8.76097 NA NA 3.71832 NA
NA 2.95989 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: VA 461 276.55925 276.42432 276.69419 NA NA
1.47820 1.38854 1.58033 NA NA 277.74566
277.47994 278.01137 NA NA 2.91085 2.73430 3.11196 NA
NA 0.39013 0.30985 0.46490 NA NA NA
NA 0 0 0 -1.18641 -1.43306 -0.93976 NA
NA 2.70200 2.53811 2.88868 NA NA 0.99573 NA
NA NA NA NA 8.69250 NA NA 7.28494 NA
NA 2.94830 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
CNT: WV 120 278.39675 277.99275 278.80074 NA NA
2.25795 2.00389 2.58641 NA NA 280.26500
279.36629 281.16371 NA NA 5.02297 4.45779 5.75365 NA
NA 0.73199 0.63622 0.80554 NA NA NA
NA 0 0 0 -1.86826 -2.53110 -1.20541 NA
NA 3.70469 3.28785 4.24361 NA NA 0.99333 NA
NA NA NA NA 17.10074 NA NA 13.61036 NA
NA 4.13530 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #64485] Stat Analysis efficiency and grouping issues
From: Tardif, Elliot M
Time: Fri Dec 06 11:10:55 2013
Good afternoon John,
I'm happy to report that after reading your reply and doing a little
exploration of my own, that I believe I've solved the issues I
described earlier.
Thanks again for your help,
Elliot Tardif, Meteorologist II
NC DENR, Division of Air Quality
Planning Section, Attainment Planning Branch
1641 Mail Service Center
Raleigh, NC 27699-1641
Phone/Fax: 919-707-8483
Email: Elliot.Tardif at ncdenr.gov
Web : http://www.ncair.org
Email correspondence to and from this address is subject to the North
Carolina Public Records Law and may be disclosed to third parties
unless the content is exempt by statue or other regulation.
-----Original Message-----
From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
Sent: Wednesday, December 04, 2013 4:52 PM
To: Tardif, Elliot M
Subject: Re: [rt.rap.ucar.edu #64485] Stat Analysis efficiency and
grouping issues
Elliot,
Thanks for sending the sample data. I ran STAT-Analysis on that data
using the config file you sent, and indeed, it produces a single line
of output. Your config file contains the following setting:
vx_mask = ["SEMAP", "AL", "FL", "GA", "KY", "MS", "NC", "SC",
"TN", "VA", "WV"];
Here's a selection from the file "METv4.1/data/config/README":
// Job command FILTERING options to further refine the STAT
data:
// Each optional argument may be used in the job
specification multiple
// times unless otherwise indicated. When multiple optional
arguments of
// the same type are indicated, the analysis will be
performed over their
// union
So when you list multiple options, the job will run over their union.
Instead of defining a single job, you could define multiple jobs - one
for each vx_mask of interest. For example...
jobs = [
"-job aggregate_stat -out_line_type CNT -vx_mask SEMAP",
"-job aggregate_stat -out_line_type CNT -vx_mask AL",
"-job aggregate_stat -out_line_type CNT -vx_mask FL",
... and so on
];
However, we've added a new option in METv4.1 to facilitate this sort
of thing. You can specify case information for each job using the "-
by" option. STAT-Analysis will look at the contents of the
column(s) you've specified and will run that same job once for each
unique case value it finds. It's probably easier to just look at an
example. Edit the config file you sent to me like this:
jobs = [
"-job aggregate_stat -out_line_type CNT -by vx_mask"
];
I've attached the resulting output file (stat_analysis.out). You have
one CNT output line for each unique entry in the vx_mask column.
Notice that I've kept the "vx_mask = [ ... ];" setting at the top.
This restricts the lines that will be considered by STAT-Analysis, but
if you want to compute a CNT line for all of the vx_mask values, you
could just get rid of that setting at the top.
Next, suppose you want to look at continuous stats by masking region,
forecast variable, and forecast level. You'd just add those to the
"case" information like this:
-by vx_mask -by fcst_var -by fcst_lev.
Then STAT-Analysis will run the same job for each unique combination
of "vx_mask:fcst_var:fcst_lev". Make sense?
Hopefully that'll help get you going. As for how it's running on your
machine, it really all depends on the amount of data you're passing to
STAT-Analysis and the size of your machine. Right now, it sounds like
you want to run STAT-Analysis to compute daily stats. Most of STAT-
Analysis's time is spent just reading and parsing the .stat files you
pass to it. In your script, you're telling STAT-Analysis to look in
this directory:
-lookin /opt/storage/high-
speed/EMT2/METv4.1/METv4.1/out/point_stat/stats_files
I'd suggest limiting the amount of stat files you pass to it as much
as possible. For example, if you want to compute daily stats, only
pass that day's worth of stat files to it. You might consider
organizing your stat files by date, like this:
/opt/storage/high-
speed/EMT2/METv4.1/METv4.1/out/point_stat/stats_files/YYYMMDD
When you call STAT-Analysis, you could just pass it that single date
directory. That less data you pass it to parse, the quicker it will
run.
Thanks,
John Halley Gotway
met_help at ucar.edu
On 12/04/2013 09:17 AM, Tardif, Elliot M via RT wrote:
>
> Wed Dec 04 09:17:49 2013: Request 64485 was acted upon.
> Transaction: Ticket created by elliot.tardif at ncdenr.gov
> Queue: met_help
> Subject: Stat Analysis efficiency and grouping issues
> Owner: Nobody
> Requestors: elliot.tardif at ncdenr.gov
> Status: new
> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=64485
> >
>
>
> Good morning,
>
> I am attempting to run Stat Analysis to aggregate hourly Point Stat
files into CNT format output files. I am looking to produce
aggregate-stat data in CNT format for four atmospheric variables (TMP,
DPT, RH, and SPFH) at four height levels (Z2, P850, P700 and P500)
across multiple masks, and I would like to get one line of CNT data
for each variable, height level and masking region. I am trying to
get this data on a temporal scale of 24 hours, and I would eventually
like to run this for monthly and annual time scales as well. However,
Stat Analysis runs extremely slow or simply appears to freeze even
when trying to produce a daily CNT file, and I cannot get more than
one line of data in the resulting "out" file, which leads me to
believe that Stat Analysis is grouping all of the above conditions
into one line of output, instead of one line for each variable/height
level/mask combination as I would like. I've experimented with
command line instructions based on the onli!
ne!
> tutorial, using SL1L2 data instead of MPR data, but they do not
appear to be any quicker, nor are they solving this grouping issue
I've describe above.
>
> I would greatly appreciate any help on getting the data that I'm
looking for, if I am preparing the config scripts correctly, and how
long it should take Stat Analysis to produce this data. I have ftp'd
6 hours of .stat files from Point Stat, as well as the two scripts
that I am using to configure and run Stat Analysis for your review.
>
> Thank you for your time,
>
> Elliot Tardif
>
> Elliot Tardif, Meteorologist II
> NC DENR, Division of Air Quality
> Planning Section, Attainment Planning Branch
> 1641 Mail Service Center
> Raleigh, NC 27699-1641
> Phone/Fax: 919-707-8483
> Email: Elliot.Tardif at ncdenr.gov<mailto:Nick.Witcraft at ncdenr.gov>
> Web : http://www.ncair.org<http://www.ncair.org/>
>
> Email correspondence to and from this address is subject to the
North Carolina Public Records Law and may be disclosed to third
parties unless the content is exempt by statue or other regulation.
>
------------------------------------------------
More information about the Met_help
mailing list