[Met_help] [rt.rap.ucar.edu #84822] History for question on regenerating data

Julie Prestopnik via RT met_help at ucar.edu
Wed May 2 10:03:44 MDT 2018


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Hi,

I'm running point-stat using ASCAT and GFS data to verify surface wind
speeds.  I found an error in my ASCAT input data that goes back to Mar 7.
I had switched the input source of the data, and within the new data files,
it was allowing very small values (< 1 m/s) to be used as data points in
the verification.  I imagine that this is an issue, since point-stat is
using these very small values as matched pairs with the GFS, correct?

Is there a way to regenerate the point-stat statistics without using the
original GFS data?  I do have the *stat and the *mpr files, and it is
pretty easy to identify where the bad values are located.

Thanks,
Roz

-- 
Rosalyn MacCracken
Support Scientist

Ocean Applications Branch
NOAA/NWS Ocean Prediction Center
NCWCP
5830 University Research Ct
College Park, MD  20740-3818

(p) 301-683-1551
rosalyn.maccracken at noaa.gov


----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: question on regenerating data
From: Julie Prestopnik
Time: Thu Apr 19 09:23:32 2018

Hi Roz.  My apologies for the delay in responding.

Unfortunately, John is out of the office this week, and I do not know
the
answers to your questions.  As you said, I would also imagine that
point-stat is using those small values as matched pairs.  Also, I do
not
believe there is a way to regenerate the point-stat statistics without
using the original GFS data.  I cannot say with certainty, however.
Thank
you for your patience in advance.  We'll get a definite response to
you as
soon as we can.

Thanks,
Julie

On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken - NOAA Affiliate
via RT
<met_help at ucar.edu> wrote:

>
> Wed Apr 18 06:31:39 2018: Request 84822 was acted upon.
> Transaction: Ticket created by rosalyn.maccracken at noaa.gov
>        Queue: met_help
>      Subject: question on regenerating data
>        Owner: Nobody
>   Requestors: rosalyn.maccracken at noaa.gov
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>
>
> Hi,
>
> I'm running point-stat using ASCAT and GFS data to verify surface
wind
> speeds.  I found an error in my ASCAT input data that goes back to
Mar 7.
> I had switched the input source of the data, and within the new data
files,
> it was allowing very small values (< 1 m/s) to be used as data
points in
> the verification.  I imagine that this is an issue, since point-stat
is
> using these very small values as matched pairs with the GFS,
correct?
>
> Is there a way to regenerate the point-stat statistics without using
the
> original GFS data?  I do have the *stat and the *mpr files, and it
is
> pretty easy to identify where the bad values are located.
>
> Thanks,
> Roz
>
> --
> Rosalyn MacCracken
> Support Scientist
>
> Ocean Applications Branch
> NOAA/NWS Ocean Prediction Center
> NCWCP
> 5830 University Research Ct
> College Park, MD  20740-3818
>
> (p) 301-683-1551
> rosalyn.maccracken at noaa.gov
>
>

------------------------------------------------
Subject: question on regenerating data
From: John Halley Gotway
Time: Mon Apr 23 12:18:17 2018

Hi Roz,

I read that you've run Point-Stat and saved off the matched pairs
(MPR)
output line type.  And you'd like to (1) filter those MPR lines to
discard
some of them and then (2) use the filtered data to regenerate summary
statistics.  Yes, this is easily done using the STAT-Analysis tool in
MET.

You wrote that you're verifying wind speeds against ASCAT and that
you'd
like to exclude pairs where the observed wind speed is less than 1
m/s.
I'm just guessing here, but I'll presume that you want to produce both
SL1L2 and CNT output line types.  Here's what the STAT-Analysis job
would
look like:

# Filter MPR's and write SL1L2 output line
stat_analysis \
   -lookin input.stat \            # List a .stat filename or
directory
containing them
   -job aggregate_stat \        # Job type is aggregate_stat
   -line_type MPR \              # Input line type = MPR
   -out_line_type SL1L2 \      # Output line type = SL1L2 partial sums
   -fcst_var WIND \               # Only process lines where FCST_VAR
column = WIND
   -column_thresh OBS gt1 \ # Only use MPR lines where OBS column > 1
   -by
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,INTERP_PNTS
#
Run this same job for each unique combination of these columns
   -out_stat MPR_to_SL1L2.stat

This will read produce an output .stat file containing an SL1L2 line
for
each unique combination of the header columns listed after the "-by"
option.  To generate CNT output lines instead, you'd run a second job
where
you replace SL1L2 with CNT.  You could run these jobs on the command
line
or group them together into a STAT-Analysis config file, if you
prefer.
Both would work.

You could run this once for each input .stat file you're processing...
or
you could pass many input .stat files to the job.  Since FCST_INIT_BEG
and
FCST_LEAD are included in the "-by" option, you'll get separate output
lines for each unique time.

Hope that helps get you going.

Thanks,
John


On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via RT
<met_help at ucar.edu>
wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>
> Hi Roz.  My apologies for the delay in responding.
>
> Unfortunately, John is out of the office this week, and I do not
know the
> answers to your questions.  As you said, I would also imagine that
> point-stat is using those small values as matched pairs.  Also, I do
not
> believe there is a way to regenerate the point-stat statistics
without
> using the original GFS data.  I cannot say with certainty, however.
Thank
> you for your patience in advance.  We'll get a definite response to
you as
> soon as we can.
>
> Thanks,
> Julie
>
> On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken - NOAA Affiliate
via RT
> <met_help at ucar.edu> wrote:
>
> >
> > Wed Apr 18 06:31:39 2018: Request 84822 was acted upon.
> > Transaction: Ticket created by rosalyn.maccracken at noaa.gov
> >        Queue: met_help
> >      Subject: question on regenerating data
> >        Owner: Nobody
> >   Requestors: rosalyn.maccracken at noaa.gov
> >       Status: new
> >  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> >
> >
> > Hi,
> >
> > I'm running point-stat using ASCAT and GFS data to verify surface
wind
> > speeds.  I found an error in my ASCAT input data that goes back to
Mar 7.
> > I had switched the input source of the data, and within the new
data
> files,
> > it was allowing very small values (< 1 m/s) to be used as data
points in
> > the verification.  I imagine that this is an issue, since point-
stat is
> > using these very small values as matched pairs with the GFS,
correct?
> >
> > Is there a way to regenerate the point-stat statistics without
using the
> > original GFS data?  I do have the *stat and the *mpr files, and it
is
> > pretty easy to identify where the bad values are located.
> >
> > Thanks,
> > Roz
> >
> > --
> > Rosalyn MacCracken
> > Support Scientist
> >
> > Ocean Applications Branch
> > NOAA/NWS Ocean Prediction Center
> > NCWCP
> > 5830 University Research Ct
> > College Park, MD  20740-3818
> >
> > (p) 301-683-1551
> > rosalyn.maccracken at noaa.gov
> >
> >
>
>

------------------------------------------------
Subject: question on regenerating data
From: Rosalyn MacCracken - NOAA Affiliate
Time: Mon Apr 23 13:01:45 2018

Hi John,

That's actually only partially correct.  It's not that I want to use
part
of the MPR lines and discard the rest, and I do need to regenerate
statistics.  Let me try to re-explain.

Back in early March we switched from getting our ASCAT obs from the
prepbufr data, to getting it from the MGDRLITE data. So, processing
didn't
change.  I was producing statistics at certain threshold levels for
both
GFS and ASCAT.  I had this set with the cat_thresh list, at levels of
0,6,17, etc.  We found out after processing for a couple of weeks that
the
ASCAT data included these really small values, <1.0 m/s, and that
these
small wind speeds were being included into the statistics processing.

So, a couple of questions.
1) Do I have to regenerate all of my statistics (*.cts, *.cnt and *ctc
files) because of this error? Or, since I have threshold levels set,
will
those small values be amoung the statistics in the lowest thresholds?
2) I have the *.stat files, but, they are spread out into separate
directories like:
/GFS/data/hourly/${YYYYMMDDHH}/*.stat
Can I tell stat-analysis to "lookin" directories with a wildcard (like
201803*)?  If so, how?  Or, is I tell it to look in /GFS/data/hourly,
will
it look in all the directories recursively under hourly?  And, it
that's
the case, can I give it a date range, so, that it only processes data
from
March?

Roz

On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:

> Hi Roz,
>
> I read that you've run Point-Stat and saved off the matched pairs
(MPR)
> output line type.  And you'd like to (1) filter those MPR lines to
discard
> some of them and then (2) use the filtered data to regenerate
summary
> statistics.  Yes, this is easily done using the STAT-Analysis tool
in MET.
>
> You wrote that you're verifying wind speeds against ASCAT and that
you'd
> like to exclude pairs where the observed wind speed is less than 1
m/s.
> I'm just guessing here, but I'll presume that you want to produce
both
> SL1L2 and CNT output line types.  Here's what the STAT-Analysis job
would
> look like:
>
> # Filter MPR's and write SL1L2 output line
> stat_analysis \
>    -lookin input.stat \            # List a .stat filename or
directory
> containing them
>    -job aggregate_stat \        # Job type is aggregate_stat
>    -line_type MPR \              # Input line type = MPR
>    -out_line_type SL1L2 \      # Output line type = SL1L2 partial
sums
>    -fcst_var WIND \               # Only process lines where
FCST_VAR
> column = WIND
>    -column_thresh OBS gt1 \ # Only use MPR lines where OBS column >
1
>    -by
>
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,INTERP_PNTS
#
> Run this same job for each unique combination of these columns
>    -out_stat MPR_to_SL1L2.stat
>
> This will read produce an output .stat file containing an SL1L2 line
for
> each unique combination of the header columns listed after the "-by"
> option.  To generate CNT output lines instead, you'd run a second
job where
> you replace SL1L2 with CNT.  You could run these jobs on the command
line
> or group them together into a STAT-Analysis config file, if you
prefer.
> Both would work.
>
> You could run this once for each input .stat file you're
processing... or
> you could pass many input .stat files to the job.  Since
FCST_INIT_BEG and
> FCST_LEAD are included in the "-by" option, you'll get separate
output
> lines for each unique time.
>
> Hope that helps get you going.
>
> Thanks,
> John
>
>
> On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via RT <
> met_help at ucar.edu>
> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> >
> > Hi Roz.  My apologies for the delay in responding.
> >
> > Unfortunately, John is out of the office this week, and I do not
know the
> > answers to your questions.  As you said, I would also imagine that
> > point-stat is using those small values as matched pairs.  Also, I
do not
> > believe there is a way to regenerate the point-stat statistics
without
> > using the original GFS data.  I cannot say with certainty,
however.
> Thank
> > you for your patience in advance.  We'll get a definite response
to you
> as
> > soon as we can.
> >
> > Thanks,
> > Julie
> >
> > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken - NOAA
Affiliate via
> RT
> > <met_help at ucar.edu> wrote:
> >
> > >
> > > Wed Apr 18 06:31:39 2018: Request 84822 was acted upon.
> > > Transaction: Ticket created by rosalyn.maccracken at noaa.gov
> > >        Queue: met_help
> > >      Subject: question on regenerating data
> > >        Owner: Nobody
> > >   Requestors: rosalyn.maccracken at noaa.gov
> > >       Status: new
> > >  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
> >
> > >
> > >
> > > Hi,
> > >
> > > I'm running point-stat using ASCAT and GFS data to verify
surface wind
> > > speeds.  I found an error in my ASCAT input data that goes back
to Mar
> 7.
> > > I had switched the input source of the data, and within the new
data
> > files,
> > > it was allowing very small values (< 1 m/s) to be used as data
points
> in
> > > the verification.  I imagine that this is an issue, since point-
stat is
> > > using these very small values as matched pairs with the GFS,
correct?
> > >
> > > Is there a way to regenerate the point-stat statistics without
using
> the
> > > original GFS data?  I do have the *stat and the *mpr files, and
it is
> > > pretty easy to identify where the bad values are located.
> > >
> > > Thanks,
> > > Roz
> > >
> > > --
> > > Rosalyn MacCracken
> > > Support Scientist
> > >
> > > Ocean Applications Branch
> > > NOAA/NWS Ocean Prediction Center
> > > NCWCP
> > > 5830 University Research Ct
> > > College Park, MD  20740-3818
> > >
> > > (p) 301-683-1551
> > > rosalyn.maccracken at noaa.gov
> > >
> > >
> >
> >
>
>


--
Rosalyn MacCracken
Support Scientist

Ocean Applications Branch
NOAA/NWS Ocean Prediction Center
NCWCP
5830 University Research Ct
College Park, MD  20740-3818

(p) 301-683-1551
rosalyn.maccracken at noaa.gov

------------------------------------------------
Subject: question on regenerating data
From: John Halley Gotway
Time: Mon Apr 23 14:01:57 2018

Roz,

It is ultimately up to you to decide which matched pairs you want to
include in your processing.  Do you consider those small (<1.0 m/s)
observation values to be corrupt and incorrect in some way or just not
very
interesting?  If they really are BAD data values, I agree that you
should
exclude them from your analysis.  But if they're just uninteresting
values
of low wind speed, then there's no reason why you should exclude them.
For
example, *most* of the time it ins't raining, but we often included
observations of 0 precip.

There are three configurable options in Point-Stat that may be useful
here:
(1) You already know and use the "cat_thresh" option.  This threshold
defines the events and non-events for a 2x2 contingency table.  This
threshold affects the contents of FHO, CTC, CTS, MCTC, and MCTS line
types
that Point-Stat writes.
(2) The "cnt_thresh" option is a more recent addition.  Perhaps this
was a
poor name choice, but instead of defining categories, it's really a
*filtering* threshold.  This threshold affects the contents of the
SL1L2,
SAL1L2, and CNT line types that Point-Stat writes.  For example,
setting
"cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and 2 SL1L2 output
lines
containing only those points where the wind speed was >=6 and >=17,
respectively.
(3) The "wind_thresh" option is very similar to the "cnt_thresh"
option but
affects the contents of teh VL1L2, VAL1L2, and VCNT (new in met-7.0)
line
types.  Only those U/V pairs that meet the specified wind speed
threshold
are included in the output.

For both "cnt_thresh" and "wind_thresh", the default value in the
config
file is "NA", meaning, do not apply any filtering threshold criteria.

You have the flexibility to run STAT-Analysis on the MPR output lines
to
recompute any of these output line types applying whatever filtering
criteria you'd like.
Here's the MET user's guide:
https://dtcenter.org/met/users/docs/users_guide/MET_Users_Guide_v7.0.pdf
Look on page 98 for the job command options for the "aggregate_stat"
line
type when the input line type is "MPR".

For your second question, the "-lookin PATH" option is *VERY*
flexible.
You can set PATH to either a single value or multiple values.  If you
use
wildcards, then the shell expands those wildcards to multiple values.
Each
value you pass in can either be a filename or a directory name.  If
you
pass in a filename, STAT-Analysis will read it *REGARDLESS* of the
file
extension.  If you pass in a directory name, STAT-Analysis will search
that
directory *RECURSIVELY* for files ending in ".stat".  For example,
either
of the following settings would tell STAT-Analysis to read the same
list of
files:
   -lookin /GFS/data/hourly/*/*.stat
   ... or ...
   -lookin /GFS/data/hourly

Be aware though that the more data you pass to STAT-Analysis, the
longer
it'll take for it to process it.  You can decide how much data you
pass it
for each job.  I'd suggest starting with what is most convenient for
you.
If it's too slow, change the logic to pass it less data (e.g. only 1
day of
data rather than 1 month of data).

Yes, you can give it a date range.  Use -fcst_init_beg and
-fcst_init_end
to specify beginning/ending model initialization times or
-fcst_valid_beg
and -fcst_valid_end to specify beginning/ending valid times.

If you find that you're running multiple jobs on the same subset of
data
(e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC, MPR to CTS), it'd
be
more efficient to group those jobs into a config file.  That'll do the
filtering ONCE and write the filtered data to a temp file.  Then all
the
jobs read data from the temp instead of starting over from scratch.

Make sense?

John



On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken - NOAA Affiliate
via RT
<met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>
> Hi John,
>
> That's actually only partially correct.  It's not that I want to use
part
> of the MPR lines and discard the rest, and I do need to regenerate
> statistics.  Let me try to re-explain.
>
> Back in early March we switched from getting our ASCAT obs from the
> prepbufr data, to getting it from the MGDRLITE data. So, processing
didn't
> change.  I was producing statistics at certain threshold levels for
both
> GFS and ASCAT.  I had this set with the cat_thresh list, at levels
of
> 0,6,17, etc.  We found out after processing for a couple of weeks
that the
> ASCAT data included these really small values, <1.0 m/s, and that
these
> small wind speeds were being included into the statistics
processing.
>
> So, a couple of questions.
> 1) Do I have to regenerate all of my statistics (*.cts, *.cnt and
*ctc
> files) because of this error? Or, since I have threshold levels set,
will
> those small values be amoung the statistics in the lowest
thresholds?
> 2) I have the *.stat files, but, they are spread out into separate
> directories like:
> /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> Can I tell stat-analysis to "lookin" directories with a wildcard
(like
> 201803*)?  If so, how?  Or, is I tell it to look in
/GFS/data/hourly, will
> it look in all the directories recursively under hourly?  And, it
that's
> the case, can I give it a date range, so, that it only processes
data from
> March?
>
> Roz
>
> On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
> > Hi Roz,
> >
> > I read that you've run Point-Stat and saved off the matched pairs
(MPR)
> > output line type.  And you'd like to (1) filter those MPR lines to
> discard
> > some of them and then (2) use the filtered data to regenerate
summary
> > statistics.  Yes, this is easily done using the STAT-Analysis tool
in
> MET.
> >
> > You wrote that you're verifying wind speeds against ASCAT and that
you'd
> > like to exclude pairs where the observed wind speed is less than 1
m/s.
> > I'm just guessing here, but I'll presume that you want to produce
both
> > SL1L2 and CNT output line types.  Here's what the STAT-Analysis
job would
> > look like:
> >
> > # Filter MPR's and write SL1L2 output line
> > stat_analysis \
> >    -lookin input.stat \            # List a .stat filename or
directory
> > containing them
> >    -job aggregate_stat \        # Job type is aggregate_stat
> >    -line_type MPR \              # Input line type = MPR
> >    -out_line_type SL1L2 \      # Output line type = SL1L2 partial
sums
> >    -fcst_var WIND \               # Only process lines where
FCST_VAR
> > column = WIND
> >    -column_thresh OBS gt1 \ # Only use MPR lines where OBS column
> 1
> >    -by
> >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,INTERP_PNTS
#
> > Run this same job for each unique combination of these columns
> >    -out_stat MPR_to_SL1L2.stat
> >
> > This will read produce an output .stat file containing an SL1L2
line for
> > each unique combination of the header columns listed after the "-
by"
> > option.  To generate CNT output lines instead, you'd run a second
job
> where
> > you replace SL1L2 with CNT.  You could run these jobs on the
command line
> > or group them together into a STAT-Analysis config file, if you
prefer.
> > Both would work.
> >
> > You could run this once for each input .stat file you're
processing... or
> > you could pass many input .stat files to the job.  Since
FCST_INIT_BEG
> and
> > FCST_LEAD are included in the "-by" option, you'll get separate
output
> > lines for each unique time.
> >
> > Hope that helps get you going.
> >
> > Thanks,
> > John
> >
> >
> > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via RT <
> > met_help at ucar.edu>
> > wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > >
> > > Hi Roz.  My apologies for the delay in responding.
> > >
> > > Unfortunately, John is out of the office this week, and I do not
know
> the
> > > answers to your questions.  As you said, I would also imagine
that
> > > point-stat is using those small values as matched pairs.  Also,
I do
> not
> > > believe there is a way to regenerate the point-stat statistics
without
> > > using the original GFS data.  I cannot say with certainty,
however.
> > Thank
> > > you for your patience in advance.  We'll get a definite response
to you
> > as
> > > soon as we can.
> > >
> > > Thanks,
> > > Julie
> > >
> > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken - NOAA
Affiliate
> via
> > RT
> > > <met_help at ucar.edu> wrote:
> > >
> > > >
> > > > Wed Apr 18 06:31:39 2018: Request 84822 was acted upon.
> > > > Transaction: Ticket created by rosalyn.maccracken at noaa.gov
> > > >        Queue: met_help
> > > >      Subject: question on regenerating data
> > > >        Owner: Nobody
> > > >   Requestors: rosalyn.maccracken at noaa.gov
> > > >       Status: new
> > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> Ticket/Display.html?id=84822
> > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > > I'm running point-stat using ASCAT and GFS data to verify
surface
> wind
> > > > speeds.  I found an error in my ASCAT input data that goes
back to
> Mar
> > 7.
> > > > I had switched the input source of the data, and within the
new data
> > > files,
> > > > it was allowing very small values (< 1 m/s) to be used as data
points
> > in
> > > > the verification.  I imagine that this is an issue, since
point-stat
> is
> > > > using these very small values as matched pairs with the GFS,
correct?
> > > >
> > > > Is there a way to regenerate the point-stat statistics without
using
> > the
> > > > original GFS data?  I do have the *stat and the *mpr files,
and it is
> > > > pretty easy to identify where the bad values are located.
> > > >
> > > > Thanks,
> > > > Roz
> > > >
> > > > --
> > > > Rosalyn MacCracken
> > > > Support Scientist
> > > >
> > > > Ocean Applications Branch
> > > > NOAA/NWS Ocean Prediction Center
> > > > NCWCP
> > > > 5830 University Research Ct
> > > > College Park, MD  20740-3818
> > > >
> > > > (p) 301-683-1551
> > > > rosalyn.maccracken at noaa.gov
> > > >
> > > >
> > >
> > >
> >
> >
>
>
> --
> Rosalyn MacCracken
> Support Scientist
>
> Ocean Applications Branch
> NOAA/NWS Ocean Prediction Center
> NCWCP
> 5830 University Research Ct
> College Park, MD  20740-3818
>
> (p) 301-683-1551
> rosalyn.maccracken at noaa.gov
>
>

------------------------------------------------
Subject: question on regenerating data
From: Rosalyn MacCracken - NOAA Affiliate
Time: Tue Apr 24 07:48:43 2018

Hi John,

Yes, that makes sense.  Those very small values (<1.0 m/s), are bad
values.  That's why they shouldn't be included in the processing.

So, I need to just regenerate hourly data, one hour at a time.  Would
it
make sense to use a shell script and loop stat-analysis?  Something
like:

for day in 11 12
do
  for cycle in 00 06 12 18
  do
stat_analysis -lookin /GFS/data/hourly/201803${day}${hour}/*.stat \
-job aggregate_stat \
   -line_type MPR \
   -out_line_type CTC,CTS,CNT \
  -fcst_var WIND \
-column_thresh OBS gt1 \
 -by
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,INTERP_PNTS
-out_stat /new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
  done
done

or, something like that?  And, will this regenerate hour forecasts, at
each
forecast and lead hour?  I guess it will see the forecast and lead
hour
from the *.stat file, and whatever *stat file is in the directory, it
will
regenerate those hours, right?

So, I need to regenerate the CTC, CNT and CTS files.  That's why I
did:
 -out_line_type CTC,CTS,CNT
but, will that make 3 separate files, or just another *.stat file?

Roz


On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:

> Roz,
>
> It is ultimately up to you to decide which matched pairs you want to
> include in your processing.  Do you consider those small (<1.0 m/s)
> observation values to be corrupt and incorrect in some way or just
not very
> interesting?  If they really are BAD data values, I agree that you
should
> exclude them from your analysis.  But if they're just uninteresting
values
> of low wind speed, then there's no reason why you should exclude
them.  For
> example, *most* of the time it ins't raining, but we often included
> observations of 0 precip.
>
> There are three configurable options in Point-Stat that may be
useful here:
> (1) You already know and use the "cat_thresh" option.  This
threshold
> defines the events and non-events for a 2x2 contingency table.  This
> threshold affects the contents of FHO, CTC, CTS, MCTC, and MCTS line
types
> that Point-Stat writes.
> (2) The "cnt_thresh" option is a more recent addition.  Perhaps this
was a
> poor name choice, but instead of defining categories, it's really a
> *filtering* threshold.  This threshold affects the contents of the
SL1L2,
> SAL1L2, and CNT line types that Point-Stat writes.  For example,
setting
> "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and 2 SL1L2 output
lines
> containing only those points where the wind speed was >=6 and >=17,
> respectively.
> (3) The "wind_thresh" option is very similar to the "cnt_thresh"
option but
> affects the contents of teh VL1L2, VAL1L2, and VCNT (new in met-7.0)
line
> types.  Only those U/V pairs that meet the specified wind speed
threshold
> are included in the output.
>
> For both "cnt_thresh" and "wind_thresh", the default value in the
config
> file is "NA", meaning, do not apply any filtering threshold
criteria.
>
> You have the flexibility to run STAT-Analysis on the MPR output
lines to
> recompute any of these output line types applying whatever filtering
> criteria you'd like.
> Here's the MET user's guide:
>
https://dtcenter.org/met/users/docs/users_guide/MET_Users_Guide_v7.0.pdf
> Look on page 98 for the job command options for the "aggregate_stat"
line
> type when the input line type is "MPR".
>
> For your second question, the "-lookin PATH" option is *VERY*
flexible.
> You can set PATH to either a single value or multiple values.  If
you use
> wildcards, then the shell expands those wildcards to multiple
values.  Each
> value you pass in can either be a filename or a directory name.  If
you
> pass in a filename, STAT-Analysis will read it *REGARDLESS* of the
file
> extension.  If you pass in a directory name, STAT-Analysis will
search that
> directory *RECURSIVELY* for files ending in ".stat".  For example,
either
> of the following settings would tell STAT-Analysis to read the same
list of
> files:
>    -lookin /GFS/data/hourly/*/*.stat
>    ... or ...
>    -lookin /GFS/data/hourly
>
> Be aware though that the more data you pass to STAT-Analysis, the
longer
> it'll take for it to process it.  You can decide how much data you
pass it
> for each job.  I'd suggest starting with what is most convenient for
you.
> If it's too slow, change the logic to pass it less data (e.g. only 1
day of
> data rather than 1 month of data).
>
> Yes, you can give it a date range.  Use -fcst_init_beg and
-fcst_init_end
> to specify beginning/ending model initialization times or
-fcst_valid_beg
> and -fcst_valid_end to specify beginning/ending valid times.
>
> If you find that you're running multiple jobs on the same subset of
data
> (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC, MPR to CTS),
it'd be
> more efficient to group those jobs into a config file.  That'll do
the
> filtering ONCE and write the filtered data to a temp file.  Then all
the
> jobs read data from the temp instead of starting over from scratch.
>
> Make sense?
>
> John
>
>
>
> On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken - NOAA Affiliate
via RT
> <met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> >
> > Hi John,
> >
> > That's actually only partially correct.  It's not that I want to
use part
> > of the MPR lines and discard the rest, and I do need to regenerate
> > statistics.  Let me try to re-explain.
> >
> > Back in early March we switched from getting our ASCAT obs from
the
> > prepbufr data, to getting it from the MGDRLITE data. So,
processing
> didn't
> > change.  I was producing statistics at certain threshold levels
for both
> > GFS and ASCAT.  I had this set with the cat_thresh list, at levels
of
> > 0,6,17, etc.  We found out after processing for a couple of weeks
that
> the
> > ASCAT data included these really small values, <1.0 m/s, and that
these
> > small wind speeds were being included into the statistics
processing.
> >
> > So, a couple of questions.
> > 1) Do I have to regenerate all of my statistics (*.cts, *.cnt and
*ctc
> > files) because of this error? Or, since I have threshold levels
set, will
> > those small values be amoung the statistics in the lowest
thresholds?
> > 2) I have the *.stat files, but, they are spread out into separate
> > directories like:
> > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> > Can I tell stat-analysis to "lookin" directories with a wildcard
(like
> > 201803*)?  If so, how?  Or, is I tell it to look in
/GFS/data/hourly,
> will
> > it look in all the directories recursively under hourly?  And, it
that's
> > the case, can I give it a date range, so, that it only processes
data
> from
> > March?
> >
> > Roz
> >
> > On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> >
> > > Hi Roz,
> > >
> > > I read that you've run Point-Stat and saved off the matched
pairs (MPR)
> > > output line type.  And you'd like to (1) filter those MPR lines
to
> > discard
> > > some of them and then (2) use the filtered data to regenerate
summary
> > > statistics.  Yes, this is easily done using the STAT-Analysis
tool in
> > MET.
> > >
> > > You wrote that you're verifying wind speeds against ASCAT and
that
> you'd
> > > like to exclude pairs where the observed wind speed is less than
1 m/s.
> > > I'm just guessing here, but I'll presume that you want to
produce both
> > > SL1L2 and CNT output line types.  Here's what the STAT-Analysis
job
> would
> > > look like:
> > >
> > > # Filter MPR's and write SL1L2 output line
> > > stat_analysis \
> > >    -lookin input.stat \            # List a .stat filename or
directory
> > > containing them
> > >    -job aggregate_stat \        # Job type is aggregate_stat
> > >    -line_type MPR \              # Input line type = MPR
> > >    -out_line_type SL1L2 \      # Output line type = SL1L2
partial sums
> > >    -fcst_var WIND \               # Only process lines where
FCST_VAR
> > > column = WIND
> > >    -column_thresh OBS gt1 \ # Only use MPR lines where OBS
column > 1
> > >    -by
> > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,INTERP_PNTS
> #
> > > Run this same job for each unique combination of these columns
> > >    -out_stat MPR_to_SL1L2.stat
> > >
> > > This will read produce an output .stat file containing an SL1L2
line
> for
> > > each unique combination of the header columns listed after the
"-by"
> > > option.  To generate CNT output lines instead, you'd run a
second job
> > where
> > > you replace SL1L2 with CNT.  You could run these jobs on the
command
> line
> > > or group them together into a STAT-Analysis config file, if you
prefer.
> > > Both would work.
> > >
> > > You could run this once for each input .stat file you're
processing...
> or
> > > you could pass many input .stat files to the job.  Since
FCST_INIT_BEG
> > and
> > > FCST_LEAD are included in the "-by" option, you'll get separate
output
> > > lines for each unique time.
> > >
> > > Hope that helps get you going.
> > >
> > > Thanks,
> > > John
> > >
> > >
> > > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via RT <
> > > met_help at ucar.edu>
> > > wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
>
> > > >
> > > > Hi Roz.  My apologies for the delay in responding.
> > > >
> > > > Unfortunately, John is out of the office this week, and I do
not know
> > the
> > > > answers to your questions.  As you said, I would also imagine
that
> > > > point-stat is using those small values as matched pairs.
Also, I do
> > not
> > > > believe there is a way to regenerate the point-stat statistics
> without
> > > > using the original GFS data.  I cannot say with certainty,
however.
> > > Thank
> > > > you for your patience in advance.  We'll get a definite
response to
> you
> > > as
> > > > soon as we can.
> > > >
> > > > Thanks,
> > > > Julie
> > > >
> > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken - NOAA
Affiliate
> > via
> > > RT
> > > > <met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > Wed Apr 18 06:31:39 2018: Request 84822 was acted upon.
> > > > > Transaction: Ticket created by rosalyn.maccracken at noaa.gov
> > > > >        Queue: met_help
> > > > >      Subject: question on regenerating data
> > > > >        Owner: Nobody
> > > > >   Requestors: rosalyn.maccracken at noaa.gov
> > > > >       Status: new
> > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> > Ticket/Display.html?id=84822
> > > >
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm running point-stat using ASCAT and GFS data to verify
surface
> > wind
> > > > > speeds.  I found an error in my ASCAT input data that goes
back to
> > Mar
> > > 7.
> > > > > I had switched the input source of the data, and within the
new
> data
> > > > files,
> > > > > it was allowing very small values (< 1 m/s) to be used as
data
> points
> > > in
> > > > > the verification.  I imagine that this is an issue, since
> point-stat
> > is
> > > > > using these very small values as matched pairs with the GFS,
> correct?
> > > > >
> > > > > Is there a way to regenerate the point-stat statistics
without
> using
> > > the
> > > > > original GFS data?  I do have the *stat and the *mpr files,
and it
> is
> > > > > pretty easy to identify where the bad values are located.
> > > > >
> > > > > Thanks,
> > > > > Roz
> > > > >
> > > > > --
> > > > > Rosalyn MacCracken
> > > > > Support Scientist
> > > > >
> > > > > Ocean Applications Branch
> > > > > NOAA/NWS Ocean Prediction Center
> > > > > NCWCP
> > > > > 5830 University Research Ct
> > > > > College Park, MD  20740-3818
> > > > >
> > > > > (p) 301-683-1551
> > > > > rosalyn.maccracken at noaa.gov
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
> > --
> > Rosalyn MacCracken
> > Support Scientist
> >
> > Ocean Applications Branch
> > NOAA/NWS Ocean Prediction Center
> > NCWCP
> > 5830 University Research Ct
> > College Park, MD  20740-3818
> >
> > (p) 301-683-1551
> > rosalyn.maccracken at noaa.gov
> >
> >
>
>


--
Rosalyn MacCracken
Support Scientist

Ocean Applications Branch
NOAA/NWS Ocean Prediction Center
NCWCP
5830 University Research Ct
College Park, MD  20740-3818

(p) 301-683-1551
rosalyn.maccracken at noaa.gov

------------------------------------------------
Subject: question on regenerating data
From: John Halley Gotway
Time: Tue Apr 24 09:42:06 2018

Roz,

Each "-job aggregate_stat" only generates a single output line type.
So
using "-out_line_type CTC,CTS,CNT" will not work.

You'll need to run separate jobs for each output line type you want to
generate.  That's why I'd recommend grouping those multiple jobs
together
into a single STAT-Analysis config file.  Then you'd call STAT-
Analysis
once using the "-config" command line option.

Another issue is that if you set "-out_stat" to the same filename,
it'll
get overridden by each job.  STAT-Analysis will overwrite that output
file
rather than appending to it.

You could send me a day's worth of .stat output files
(/GFS/data/hourly/20180305*) and I could send you some suggestions.
Or if
you have access to theia you could copy them up there and point me to
it.

Thanks,
John

On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken - NOAA Affiliate
via RT
<met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>
> Hi John,
>
> Yes, that makes sense.  Those very small values (<1.0 m/s), are bad
> values.  That's why they shouldn't be included in the processing.
>
> So, I need to just regenerate hourly data, one hour at a time.
Would it
> make sense to use a shell script and loop stat-analysis?  Something
like:
>
> for day in 11 12
> do
>   for cycle in 00 06 12 18
>   do
> stat_analysis -lookin /GFS/data/hourly/201803${day}${hour}/*.stat \
> -job aggregate_stat \
>    -line_type MPR \
>    -out_line_type CTC,CTS,CNT \
>   -fcst_var WIND \
> -column_thresh OBS gt1 \
>  -by
>
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,INTERP_PNTS
> -out_stat /new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
>   done
> done
>
> or, something like that?  And, will this regenerate hour forecasts,
at each
> forecast and lead hour?  I guess it will see the forecast and lead
hour
> from the *.stat file, and whatever *stat file is in the directory,
it will
> regenerate those hours, right?
>
> So, I need to regenerate the CTC, CNT and CTS files.  That's why I
did:
>  -out_line_type CTC,CTS,CNT
> but, will that make 3 separate files, or just another *.stat file?
>
> Roz
>
>
> On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
> > Roz,
> >
> > It is ultimately up to you to decide which matched pairs you want
to
> > include in your processing.  Do you consider those small (<1.0
m/s)
> > observation values to be corrupt and incorrect in some way or just
not
> very
> > interesting?  If they really are BAD data values, I agree that you
should
> > exclude them from your analysis.  But if they're just
uninteresting
> values
> > of low wind speed, then there's no reason why you should exclude
them.
> For
> > example, *most* of the time it ins't raining, but we often
included
> > observations of 0 precip.
> >
> > There are three configurable options in Point-Stat that may be
useful
> here:
> > (1) You already know and use the "cat_thresh" option.  This
threshold
> > defines the events and non-events for a 2x2 contingency table.
This
> > threshold affects the contents of FHO, CTC, CTS, MCTC, and MCTS
line
> types
> > that Point-Stat writes.
> > (2) The "cnt_thresh" option is a more recent addition.  Perhaps
this was
> a
> > poor name choice, but instead of defining categories, it's really
a
> > *filtering* threshold.  This threshold affects the contents of the
SL1L2,
> > SAL1L2, and CNT line types that Point-Stat writes.  For example,
setting
> > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and 2 SL1L2
output lines
> > containing only those points where the wind speed was >=6 and
>=17,
> > respectively.
> > (3) The "wind_thresh" option is very similar to the "cnt_thresh"
option
> but
> > affects the contents of teh VL1L2, VAL1L2, and VCNT (new in met-
7.0) line
> > types.  Only those U/V pairs that meet the specified wind speed
threshold
> > are included in the output.
> >
> > For both "cnt_thresh" and "wind_thresh", the default value in the
config
> > file is "NA", meaning, do not apply any filtering threshold
criteria.
> >
> > You have the flexibility to run STAT-Analysis on the MPR output
lines to
> > recompute any of these output line types applying whatever
filtering
> > criteria you'd like.
> > Here's the MET user's guide:
> >
https://dtcenter.org/met/users/docs/users_guide/MET_Users_Guide_v7.0.pdf
> > Look on page 98 for the job command options for the
"aggregate_stat" line
> > type when the input line type is "MPR".
> >
> > For your second question, the "-lookin PATH" option is *VERY*
flexible.
> > You can set PATH to either a single value or multiple values.  If
you use
> > wildcards, then the shell expands those wildcards to multiple
values.
> Each
> > value you pass in can either be a filename or a directory name.
If you
> > pass in a filename, STAT-Analysis will read it *REGARDLESS* of the
file
> > extension.  If you pass in a directory name, STAT-Analysis will
search
> that
> > directory *RECURSIVELY* for files ending in ".stat".  For example,
either
> > of the following settings would tell STAT-Analysis to read the
same list
> of
> > files:
> >    -lookin /GFS/data/hourly/*/*.stat
> >    ... or ...
> >    -lookin /GFS/data/hourly
> >
> > Be aware though that the more data you pass to STAT-Analysis, the
longer
> > it'll take for it to process it.  You can decide how much data you
pass
> it
> > for each job.  I'd suggest starting with what is most convenient
for you.
> > If it's too slow, change the logic to pass it less data (e.g. only
1 day
> of
> > data rather than 1 month of data).
> >
> > Yes, you can give it a date range.  Use -fcst_init_beg and
-fcst_init_end
> > to specify beginning/ending model initialization times or
-fcst_valid_beg
> > and -fcst_valid_end to specify beginning/ending valid times.
> >
> > If you find that you're running multiple jobs on the same subset
of data
> > (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC, MPR to CTS),
it'd be
> > more efficient to group those jobs into a config file.  That'll do
the
> > filtering ONCE and write the filtered data to a temp file.  Then
all the
> > jobs read data from the temp instead of starting over from
scratch.
> >
> > Make sense?
> >
> > John
> >
> >
> >
> > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken - NOAA
Affiliate via
> RT
> > <met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > >
> > > Hi John,
> > >
> > > That's actually only partially correct.  It's not that I want to
use
> part
> > > of the MPR lines and discard the rest, and I do need to
regenerate
> > > statistics.  Let me try to re-explain.
> > >
> > > Back in early March we switched from getting our ASCAT obs from
the
> > > prepbufr data, to getting it from the MGDRLITE data. So,
processing
> > didn't
> > > change.  I was producing statistics at certain threshold levels
for
> both
> > > GFS and ASCAT.  I had this set with the cat_thresh list, at
levels of
> > > 0,6,17, etc.  We found out after processing for a couple of
weeks that
> > the
> > > ASCAT data included these really small values, <1.0 m/s, and
that these
> > > small wind speeds were being included into the statistics
processing.
> > >
> > > So, a couple of questions.
> > > 1) Do I have to regenerate all of my statistics (*.cts, *.cnt
and *ctc
> > > files) because of this error? Or, since I have threshold levels
set,
> will
> > > those small values be amoung the statistics in the lowest
thresholds?
> > > 2) I have the *.stat files, but, they are spread out into
separate
> > > directories like:
> > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> > > Can I tell stat-analysis to "lookin" directories with a wildcard
(like
> > > 201803*)?  If so, how?  Or, is I tell it to look in
/GFS/data/hourly,
> > will
> > > it look in all the directories recursively under hourly?  And,
it
> that's
> > > the case, can I give it a date range, so, that it only processes
data
> > from
> > > March?
> > >
> > > Roz
> > >
> > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > > Hi Roz,
> > > >
> > > > I read that you've run Point-Stat and saved off the matched
pairs
> (MPR)
> > > > output line type.  And you'd like to (1) filter those MPR
lines to
> > > discard
> > > > some of them and then (2) use the filtered data to regenerate
summary
> > > > statistics.  Yes, this is easily done using the STAT-Analysis
tool in
> > > MET.
> > > >
> > > > You wrote that you're verifying wind speeds against ASCAT and
that
> > you'd
> > > > like to exclude pairs where the observed wind speed is less
than 1
> m/s.
> > > > I'm just guessing here, but I'll presume that you want to
produce
> both
> > > > SL1L2 and CNT output line types.  Here's what the STAT-
Analysis job
> > would
> > > > look like:
> > > >
> > > > # Filter MPR's and write SL1L2 output line
> > > > stat_analysis \
> > > >    -lookin input.stat \            # List a .stat filename or
> directory
> > > > containing them
> > > >    -job aggregate_stat \        # Job type is aggregate_stat
> > > >    -line_type MPR \              # Input line type = MPR
> > > >    -out_line_type SL1L2 \      # Output line type = SL1L2
partial
> sums
> > > >    -fcst_var WIND \               # Only process lines where
FCST_VAR
> > > > column = WIND
> > > >    -column_thresh OBS gt1 \ # Only use MPR lines where OBS
column > 1
> > > >    -by
> > > > MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,
> INTERP_PNTS
> > #
> > > > Run this same job for each unique combination of these columns
> > > >    -out_stat MPR_to_SL1L2.stat
> > > >
> > > > This will read produce an output .stat file containing an
SL1L2 line
> > for
> > > > each unique combination of the header columns listed after the
"-by"
> > > > option.  To generate CNT output lines instead, you'd run a
second job
> > > where
> > > > you replace SL1L2 with CNT.  You could run these jobs on the
command
> > line
> > > > or group them together into a STAT-Analysis config file, if
you
> prefer.
> > > > Both would work.
> > > >
> > > > You could run this once for each input .stat file you're
> processing...
> > or
> > > > you could pass many input .stat files to the job.  Since
> FCST_INIT_BEG
> > > and
> > > > FCST_LEAD are included in the "-by" option, you'll get
separate
> output
> > > > lines for each unique time.
> > > >
> > > > Hope that helps get you going.
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > >
> > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via RT <
> > > > met_help at ucar.edu>
> > > > wrote:
> > > >
> > > > >
> > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > > > >
> > > > > Hi Roz.  My apologies for the delay in responding.
> > > > >
> > > > > Unfortunately, John is out of the office this week, and I do
not
> know
> > > the
> > > > > answers to your questions.  As you said, I would also
imagine that
> > > > > point-stat is using those small values as matched pairs.
Also, I
> do
> > > not
> > > > > believe there is a way to regenerate the point-stat
statistics
> > without
> > > > > using the original GFS data.  I cannot say with certainty,
however.
> > > > Thank
> > > > > you for your patience in advance.  We'll get a definite
response to
> > you
> > > > as
> > > > > soon as we can.
> > > > >
> > > > > Thanks,
> > > > > Julie
> > > > >
> > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken - NOAA
> Affiliate
> > > via
> > > > RT
> > > > > <met_help at ucar.edu> wrote:
> > > > >
> > > > > >
> > > > > > Wed Apr 18 06:31:39 2018: Request 84822 was acted upon.
> > > > > > Transaction: Ticket created by rosalyn.maccracken at noaa.gov
> > > > > >        Queue: met_help
> > > > > >      Subject: question on regenerating data
> > > > > >        Owner: Nobody
> > > > > >   Requestors: rosalyn.maccracken at noaa.gov
> > > > > >       Status: new
> > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > Ticket/Display.html?id=84822
> > > > >
> > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm running point-stat using ASCAT and GFS data to verify
surface
> > > wind
> > > > > > speeds.  I found an error in my ASCAT input data that goes
back
> to
> > > Mar
> > > > 7.
> > > > > > I had switched the input source of the data, and within
the new
> > data
> > > > > files,
> > > > > > it was allowing very small values (< 1 m/s) to be used as
data
> > points
> > > > in
> > > > > > the verification.  I imagine that this is an issue, since
> > point-stat
> > > is
> > > > > > using these very small values as matched pairs with the
GFS,
> > correct?
> > > > > >
> > > > > > Is there a way to regenerate the point-stat statistics
without
> > using
> > > > the
> > > > > > original GFS data?  I do have the *stat and the *mpr
files, and
> it
> > is
> > > > > > pretty easy to identify where the bad values are located.
> > > > > >
> > > > > > Thanks,
> > > > > > Roz
> > > > > >
> > > > > > --
> > > > > > Rosalyn MacCracken
> > > > > > Support Scientist
> > > > > >
> > > > > > Ocean Applications Branch
> > > > > > NOAA/NWS Ocean Prediction Center
> > > > > > NCWCP
> > > > > > 5830 University Research Ct
> > > > > > College Park, MD  20740-3818
> > > > > >
> > > > > > (p) 301-683-1551
> > > > > > rosalyn.maccracken at noaa.gov
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Rosalyn MacCracken
> > > Support Scientist
> > >
> > > Ocean Applications Branch
> > > NOAA/NWS Ocean Prediction Center
> > > NCWCP
> > > 5830 University Research Ct
> > > College Park, MD  20740-3818
> > >
> > > (p) 301-683-1551
> > > rosalyn.maccracken at noaa.gov
> > >
> > >
> >
> >
>
>
> --
> Rosalyn MacCracken
> Support Scientist
>
> Ocean Applications Branch
> NOAA/NWS Ocean Prediction Center
> NCWCP
> 5830 University Research Ct
> College Park, MD  20740-3818
>
> (p) 301-683-1551
> rosalyn.maccracken at noaa.gov
>
>

------------------------------------------------
Subject: question on regenerating data
From: Rosalyn MacCracken - NOAA Affiliate
Time: Tue Apr 24 11:57:43 2018

HI John,

Yes, it does seem that the -config option is the way to go to recreate
those 3 files. I'll be sure to have a unique file name, or, mv the
output
file to a different name before running the command again.  Thanks for
pointing that out.

I'm teleworking for the next couple of weeks, so, download and send
you
*.stat files like I can when I'm at my computer at work.  I don't have
access to theia or wcoss anymore.  You have an ftp server that I can
upload
data to, right?  If not, I can try and fiddle around with this
tomorrow and
see if I can't get this to work the way I want to.

Roz

On Tue, Apr 24, 2018 at 11:42 AM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:

> Roz,
>
> Each "-job aggregate_stat" only generates a single output line type.
So
> using "-out_line_type CTC,CTS,CNT" will not work.
>
> You'll need to run separate jobs for each output line type you want
to
> generate.  That's why I'd recommend grouping those multiple jobs
together
> into a single STAT-Analysis config file.  Then you'd call STAT-
Analysis
> once using the "-config" command line option.
>
> Another issue is that if you set "-out_stat" to the same filename,
it'll
> get overridden by each job.  STAT-Analysis will overwrite that
output file
> rather than appending to it.
>
> You could send me a day's worth of .stat output files
> (/GFS/data/hourly/20180305*) and I could send you some suggestions.
Or if
> you have access to theia you could copy them up there and point me
to it.
>
> Thanks,
> John
>
> On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken - NOAA Affiliate
via RT
> <met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> >
> > Hi John,
> >
> > Yes, that makes sense.  Those very small values (<1.0 m/s), are
bad
> > values.  That's why they shouldn't be included in the processing.
> >
> > So, I need to just regenerate hourly data, one hour at a time.
Would it
> > make sense to use a shell script and loop stat-analysis?
Something like:
> >
> > for day in 11 12
> > do
> >   for cycle in 00 06 12 18
> >   do
> > stat_analysis -lookin /GFS/data/hourly/201803${day}${hour}/*.stat
\
> > -job aggregate_stat \
> >    -line_type MPR \
> >    -out_line_type CTC,CTS,CNT \
> >   -fcst_var WIND \
> > -column_thresh OBS gt1 \
> >  -by
> >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,INTERP_PNTS
> > -out_stat /new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
> >   done
> > done
> >
> > or, something like that?  And, will this regenerate hour
forecasts, at
> each
> > forecast and lead hour?  I guess it will see the forecast and lead
hour
> > from the *.stat file, and whatever *stat file is in the directory,
it
> will
> > regenerate those hours, right?
> >
> > So, I need to regenerate the CTC, CNT and CTS files.  That's why I
did:
> >  -out_line_type CTC,CTS,CNT
> > but, will that make 3 separate files, or just another *.stat file?
> >
> > Roz
> >
> >
> > On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> >
> > > Roz,
> > >
> > > It is ultimately up to you to decide which matched pairs you
want to
> > > include in your processing.  Do you consider those small (<1.0
m/s)
> > > observation values to be corrupt and incorrect in some way or
just not
> > very
> > > interesting?  If they really are BAD data values, I agree that
you
> should
> > > exclude them from your analysis.  But if they're just
uninteresting
> > values
> > > of low wind speed, then there's no reason why you should exclude
them.
> > For
> > > example, *most* of the time it ins't raining, but we often
included
> > > observations of 0 precip.
> > >
> > > There are three configurable options in Point-Stat that may be
useful
> > here:
> > > (1) You already know and use the "cat_thresh" option.  This
threshold
> > > defines the events and non-events for a 2x2 contingency table.
This
> > > threshold affects the contents of FHO, CTC, CTS, MCTC, and MCTS
line
> > types
> > > that Point-Stat writes.
> > > (2) The "cnt_thresh" option is a more recent addition.  Perhaps
this
> was
> > a
> > > poor name choice, but instead of defining categories, it's
really a
> > > *filtering* threshold.  This threshold affects the contents of
the
> SL1L2,
> > > SAL1L2, and CNT line types that Point-Stat writes.  For example,
> setting
> > > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and 2 SL1L2
output
> lines
> > > containing only those points where the wind speed was >=6 and
>=17,
> > > respectively.
> > > (3) The "wind_thresh" option is very similar to the "cnt_thresh"
option
> > but
> > > affects the contents of teh VL1L2, VAL1L2, and VCNT (new in met-
7.0)
> line
> > > types.  Only those U/V pairs that meet the specified wind speed
> threshold
> > > are included in the output.
> > >
> > > For both "cnt_thresh" and "wind_thresh", the default value in
the
> config
> > > file is "NA", meaning, do not apply any filtering threshold
criteria.
> > >
> > > You have the flexibility to run STAT-Analysis on the MPR output
lines
> to
> > > recompute any of these output line types applying whatever
filtering
> > > criteria you'd like.
> > > Here's the MET user's guide:
> > > https://dtcenter.org/met/users/docs/users_guide/MET_
> Users_Guide_v7.0.pdf
> > > Look on page 98 for the job command options for the
"aggregate_stat"
> line
> > > type when the input line type is "MPR".
> > >
> > > For your second question, the "-lookin PATH" option is *VERY*
flexible.
> > > You can set PATH to either a single value or multiple values.
If you
> use
> > > wildcards, then the shell expands those wildcards to multiple
values.
> > Each
> > > value you pass in can either be a filename or a directory name.
If you
> > > pass in a filename, STAT-Analysis will read it *REGARDLESS* of
the file
> > > extension.  If you pass in a directory name, STAT-Analysis will
search
> > that
> > > directory *RECURSIVELY* for files ending in ".stat".  For
example,
> either
> > > of the following settings would tell STAT-Analysis to read the
same
> list
> > of
> > > files:
> > >    -lookin /GFS/data/hourly/*/*.stat
> > >    ... or ...
> > >    -lookin /GFS/data/hourly
> > >
> > > Be aware though that the more data you pass to STAT-Analysis,
the
> longer
> > > it'll take for it to process it.  You can decide how much data
you pass
> > it
> > > for each job.  I'd suggest starting with what is most convenient
for
> you.
> > > If it's too slow, change the logic to pass it less data (e.g.
only 1
> day
> > of
> > > data rather than 1 month of data).
> > >
> > > Yes, you can give it a date range.  Use -fcst_init_beg and
> -fcst_init_end
> > > to specify beginning/ending model initialization times or
> -fcst_valid_beg
> > > and -fcst_valid_end to specify beginning/ending valid times.
> > >
> > > If you find that you're running multiple jobs on the same subset
of
> data
> > > (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC, MPR to CTS),
it'd
> be
> > > more efficient to group those jobs into a config file.  That'll
do the
> > > filtering ONCE and write the filtered data to a temp file.  Then
all
> the
> > > jobs read data from the temp instead of starting over from
scratch.
> > >
> > > Make sense?
> > >
> > > John
> > >
> > >
> > >
> > > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken - NOAA
Affiliate
> via
> > RT
> > > <met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
>
> > > >
> > > > Hi John,
> > > >
> > > > That's actually only partially correct.  It's not that I want
to use
> > part
> > > > of the MPR lines and discard the rest, and I do need to
regenerate
> > > > statistics.  Let me try to re-explain.
> > > >
> > > > Back in early March we switched from getting our ASCAT obs
from the
> > > > prepbufr data, to getting it from the MGDRLITE data. So,
processing
> > > didn't
> > > > change.  I was producing statistics at certain threshold
levels for
> > both
> > > > GFS and ASCAT.  I had this set with the cat_thresh list, at
levels of
> > > > 0,6,17, etc.  We found out after processing for a couple of
weeks
> that
> > > the
> > > > ASCAT data included these really small values, <1.0 m/s, and
that
> these
> > > > small wind speeds were being included into the statistics
processing.
> > > >
> > > > So, a couple of questions.
> > > > 1) Do I have to regenerate all of my statistics (*.cts, *.cnt
and
> *ctc
> > > > files) because of this error? Or, since I have threshold
levels set,
> > will
> > > > those small values be amoung the statistics in the lowest
thresholds?
> > > > 2) I have the *.stat files, but, they are spread out into
separate
> > > > directories like:
> > > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> > > > Can I tell stat-analysis to "lookin" directories with a
wildcard
> (like
> > > > 201803*)?  If so, how?  Or, is I tell it to look in
/GFS/data/hourly,
> > > will
> > > > it look in all the directories recursively under hourly?  And,
it
> > that's
> > > > the case, can I give it a date range, so, that it only
processes data
> > > from
> > > > March?
> > > >
> > > > Roz
> > > >
> > > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > > Hi Roz,
> > > > >
> > > > > I read that you've run Point-Stat and saved off the matched
pairs
> > (MPR)
> > > > > output line type.  And you'd like to (1) filter those MPR
lines to
> > > > discard
> > > > > some of them and then (2) use the filtered data to
regenerate
> summary
> > > > > statistics.  Yes, this is easily done using the STAT-
Analysis tool
> in
> > > > MET.
> > > > >
> > > > > You wrote that you're verifying wind speeds against ASCAT
and that
> > > you'd
> > > > > like to exclude pairs where the observed wind speed is less
than 1
> > m/s.
> > > > > I'm just guessing here, but I'll presume that you want to
produce
> > both
> > > > > SL1L2 and CNT output line types.  Here's what the STAT-
Analysis job
> > > would
> > > > > look like:
> > > > >
> > > > > # Filter MPR's and write SL1L2 output line
> > > > > stat_analysis \
> > > > >    -lookin input.stat \            # List a .stat filename
or
> > directory
> > > > > containing them
> > > > >    -job aggregate_stat \        # Job type is aggregate_stat
> > > > >    -line_type MPR \              # Input line type = MPR
> > > > >    -out_line_type SL1L2 \      # Output line type = SL1L2
partial
> > sums
> > > > >    -fcst_var WIND \               # Only process lines where
> FCST_VAR
> > > > > column = WIND
> > > > >    -column_thresh OBS gt1 \ # Only use MPR lines where OBS
column
> > 1
> > > > >    -by
> > > > > MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,
> > INTERP_PNTS
> > > #
> > > > > Run this same job for each unique combination of these
columns
> > > > >    -out_stat MPR_to_SL1L2.stat
> > > > >
> > > > > This will read produce an output .stat file containing an
SL1L2
> line
> > > for
> > > > > each unique combination of the header columns listed after
the
> "-by"
> > > > > option.  To generate CNT output lines instead, you'd run a
second
> job
> > > > where
> > > > > you replace SL1L2 with CNT.  You could run these jobs on the
> command
> > > line
> > > > > or group them together into a STAT-Analysis config file, if
you
> > prefer.
> > > > > Both would work.
> > > > >
> > > > > You could run this once for each input .stat file you're
> > processing...
> > > or
> > > > > you could pass many input .stat files to the job.  Since
> > FCST_INIT_BEG
> > > > and
> > > > > FCST_LEAD are included in the "-by" option, you'll get
separate
> > output
> > > > > lines for each unique time.
> > > > >
> > > > > Hope that helps get you going.
> > > > >
> > > > > Thanks,
> > > > > John
> > > > >
> > > > >
> > > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via RT <
> > > > > met_help at ucar.edu>
> > > > > wrote:
> > > > >
> > > > > >
> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > > > > >
> > > > > > Hi Roz.  My apologies for the delay in responding.
> > > > > >
> > > > > > Unfortunately, John is out of the office this week, and I
do not
> > know
> > > > the
> > > > > > answers to your questions.  As you said, I would also
imagine
> that
> > > > > > point-stat is using those small values as matched pairs.
Also, I
> > do
> > > > not
> > > > > > believe there is a way to regenerate the point-stat
statistics
> > > without
> > > > > > using the original GFS data.  I cannot say with certainty,
> however.
> > > > > Thank
> > > > > > you for your patience in advance.  We'll get a definite
response
> to
> > > you
> > > > > as
> > > > > > soon as we can.
> > > > > >
> > > > > > Thanks,
> > > > > > Julie
> > > > > >
> > > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken - NOAA
> > Affiliate
> > > > via
> > > > > RT
> > > > > > <met_help at ucar.edu> wrote:
> > > > > >
> > > > > > >
> > > > > > > Wed Apr 18 06:31:39 2018: Request 84822 was acted upon.
> > > > > > > Transaction: Ticket created by
rosalyn.maccracken at noaa.gov
> > > > > > >        Queue: met_help
> > > > > > >      Subject: question on regenerating data
> > > > > > >        Owner: Nobody
> > > > > > >   Requestors: rosalyn.maccracken at noaa.gov
> > > > > > >       Status: new
> > > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > > Ticket/Display.html?id=84822
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm running point-stat using ASCAT and GFS data to
verify
> surface
> > > > wind
> > > > > > > speeds.  I found an error in my ASCAT input data that
goes back
> > to
> > > > Mar
> > > > > 7.
> > > > > > > I had switched the input source of the data, and within
the new
> > > data
> > > > > > files,
> > > > > > > it was allowing very small values (< 1 m/s) to be used
as data
> > > points
> > > > > in
> > > > > > > the verification.  I imagine that this is an issue,
since
> > > point-stat
> > > > is
> > > > > > > using these very small values as matched pairs with the
GFS,
> > > correct?
> > > > > > >
> > > > > > > Is there a way to regenerate the point-stat statistics
without
> > > using
> > > > > the
> > > > > > > original GFS data?  I do have the *stat and the *mpr
files, and
> > it
> > > is
> > > > > > > pretty easy to identify where the bad values are
located.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Roz
> > > > > > >
> > > > > > > --
> > > > > > > Rosalyn MacCracken
> > > > > > > Support Scientist
> > > > > > >
> > > > > > > Ocean Applications Branch
> > > > > > > NOAA/NWS Ocean Prediction Center
> > > > > > > NCWCP
> > > > > > > 5830 University Research Ct
> > > > > > > College Park, MD  20740-3818
> > > > > > >
> > > > > > > (p) 301-683-1551
> > > > > > > rosalyn.maccracken at noaa.gov
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Rosalyn MacCracken
> > > > Support Scientist
> > > >
> > > > Ocean Applications Branch
> > > > NOAA/NWS Ocean Prediction Center
> > > > NCWCP
> > > > 5830 University Research Ct
> > > > College Park, MD  20740-3818
> > > >
> > > > (p) 301-683-1551
> > > > rosalyn.maccracken at noaa.gov
> > > >
> > > >
> > >
> > >
> >
> >
> > --
> > Rosalyn MacCracken
> > Support Scientist
> >
> > Ocean Applications Branch
> > NOAA/NWS Ocean Prediction Center
> > NCWCP
> > 5830 University Research Ct
> > College Park, MD  20740-3818
> >
> > (p) 301-683-1551
> > rosalyn.maccracken at noaa.gov
> >
> >
>
>


--
Rosalyn MacCracken
Support Scientist

Ocean Applications Branch
NOAA/NWS Ocean Prediction Center
NCWCP
5830 University Research Ct
College Park, MD  20740-3818

(p) 301-683-1551
rosalyn.maccracken at noaa.gov

------------------------------------------------
Subject: question on regenerating data
From: John Halley Gotway
Time: Tue Apr 24 12:49:47 2018

Roz,

Yes, we do.  Follow the instructions here:
   https://dtcenter.org/met/users/support/met_help.php#ftp

I'd suggest making a tar file for one day and posting them to the ftp
site:
   tar -cvzf sample.tar.gz /GFS/data/hourly/20180305*

Thanks,
John

On Tue, Apr 24, 2018 at 11:57 AM, Rosalyn MacCracken - NOAA Affiliate
via
RT <met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>
> HI John,
>
> Yes, it does seem that the -config option is the way to go to
recreate
> those 3 files. I'll be sure to have a unique file name, or, mv the
output
> file to a different name before running the command again.  Thanks
for
> pointing that out.
>
> I'm teleworking for the next couple of weeks, so, download and send
you
> *.stat files like I can when I'm at my computer at work.  I don't
have
> access to theia or wcoss anymore.  You have an ftp server that I can
upload
> data to, right?  If not, I can try and fiddle around with this
tomorrow and
> see if I can't get this to work the way I want to.
>
> Roz
>
> On Tue, Apr 24, 2018 at 11:42 AM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
> > Roz,
> >
> > Each "-job aggregate_stat" only generates a single output line
type.  So
> > using "-out_line_type CTC,CTS,CNT" will not work.
> >
> > You'll need to run separate jobs for each output line type you
want to
> > generate.  That's why I'd recommend grouping those multiple jobs
together
> > into a single STAT-Analysis config file.  Then you'd call STAT-
Analysis
> > once using the "-config" command line option.
> >
> > Another issue is that if you set "-out_stat" to the same filename,
it'll
> > get overridden by each job.  STAT-Analysis will overwrite that
output
> file
> > rather than appending to it.
> >
> > You could send me a day's worth of .stat output files
> > (/GFS/data/hourly/20180305*) and I could send you some
suggestions.  Or
> if
> > you have access to theia you could copy them up there and point me
to it.
> >
> > Thanks,
> > John
> >
> > On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken - NOAA
Affiliate via
> RT
> > <met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > >
> > > Hi John,
> > >
> > > Yes, that makes sense.  Those very small values (<1.0 m/s), are
bad
> > > values.  That's why they shouldn't be included in the
processing.
> > >
> > > So, I need to just regenerate hourly data, one hour at a time.
Would
> it
> > > make sense to use a shell script and loop stat-analysis?
Something
> like:
> > >
> > > for day in 11 12
> > > do
> > >   for cycle in 00 06 12 18
> > >   do
> > > stat_analysis -lookin
/GFS/data/hourly/201803${day}${hour}/*.stat \
> > > -job aggregate_stat \
> > >    -line_type MPR \
> > >    -out_line_type CTC,CTS,CNT \
> > >   -fcst_var WIND \
> > > -column_thresh OBS gt1 \
> > >  -by
> > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,INTERP_PNTS
> > > -out_stat /new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
> > >   done
> > > done
> > >
> > > or, something like that?  And, will this regenerate hour
forecasts, at
> > each
> > > forecast and lead hour?  I guess it will see the forecast and
lead hour
> > > from the *.stat file, and whatever *stat file is in the
directory, it
> > will
> > > regenerate those hours, right?
> > >
> > > So, I need to regenerate the CTC, CNT and CTS files.  That's why
I did:
> > >  -out_line_type CTC,CTS,CNT
> > > but, will that make 3 separate files, or just another *.stat
file?
> > >
> > > Roz
> > >
> > >
> > > On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > > Roz,
> > > >
> > > > It is ultimately up to you to decide which matched pairs you
want to
> > > > include in your processing.  Do you consider those small (<1.0
m/s)
> > > > observation values to be corrupt and incorrect in some way or
just
> not
> > > very
> > > > interesting?  If they really are BAD data values, I agree that
you
> > should
> > > > exclude them from your analysis.  But if they're just
uninteresting
> > > values
> > > > of low wind speed, then there's no reason why you should
exclude
> them.
> > > For
> > > > example, *most* of the time it ins't raining, but we often
included
> > > > observations of 0 precip.
> > > >
> > > > There are three configurable options in Point-Stat that may be
useful
> > > here:
> > > > (1) You already know and use the "cat_thresh" option.  This
threshold
> > > > defines the events and non-events for a 2x2 contingency table.
This
> > > > threshold affects the contents of FHO, CTC, CTS, MCTC, and
MCTS line
> > > types
> > > > that Point-Stat writes.
> > > > (2) The "cnt_thresh" option is a more recent addition.
Perhaps this
> > was
> > > a
> > > > poor name choice, but instead of defining categories, it's
really a
> > > > *filtering* threshold.  This threshold affects the contents of
the
> > SL1L2,
> > > > SAL1L2, and CNT line types that Point-Stat writes.  For
example,
> > setting
> > > > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and 2 SL1L2
output
> > lines
> > > > containing only those points where the wind speed was >=6 and
>=17,
> > > > respectively.
> > > > (3) The "wind_thresh" option is very similar to the
"cnt_thresh"
> option
> > > but
> > > > affects the contents of teh VL1L2, VAL1L2, and VCNT (new in
met-7.0)
> > line
> > > > types.  Only those U/V pairs that meet the specified wind
speed
> > threshold
> > > > are included in the output.
> > > >
> > > > For both "cnt_thresh" and "wind_thresh", the default value in
the
> > config
> > > > file is "NA", meaning, do not apply any filtering threshold
criteria.
> > > >
> > > > You have the flexibility to run STAT-Analysis on the MPR
output lines
> > to
> > > > recompute any of these output line types applying whatever
filtering
> > > > criteria you'd like.
> > > > Here's the MET user's guide:
> > > > https://dtcenter.org/met/users/docs/users_guide/MET_
> > Users_Guide_v7.0.pdf
> > > > Look on page 98 for the job command options for the
"aggregate_stat"
> > line
> > > > type when the input line type is "MPR".
> > > >
> > > > For your second question, the "-lookin PATH" option is *VERY*
> flexible.
> > > > You can set PATH to either a single value or multiple values.
If you
> > use
> > > > wildcards, then the shell expands those wildcards to multiple
values.
> > > Each
> > > > value you pass in can either be a filename or a directory
name.  If
> you
> > > > pass in a filename, STAT-Analysis will read it *REGARDLESS* of
the
> file
> > > > extension.  If you pass in a directory name, STAT-Analysis
will
> search
> > > that
> > > > directory *RECURSIVELY* for files ending in ".stat".  For
example,
> > either
> > > > of the following settings would tell STAT-Analysis to read the
same
> > list
> > > of
> > > > files:
> > > >    -lookin /GFS/data/hourly/*/*.stat
> > > >    ... or ...
> > > >    -lookin /GFS/data/hourly
> > > >
> > > > Be aware though that the more data you pass to STAT-Analysis,
the
> > longer
> > > > it'll take for it to process it.  You can decide how much data
you
> pass
> > > it
> > > > for each job.  I'd suggest starting with what is most
convenient for
> > you.
> > > > If it's too slow, change the logic to pass it less data (e.g.
only 1
> > day
> > > of
> > > > data rather than 1 month of data).
> > > >
> > > > Yes, you can give it a date range.  Use -fcst_init_beg and
> > -fcst_init_end
> > > > to specify beginning/ending model initialization times or
> > -fcst_valid_beg
> > > > and -fcst_valid_end to specify beginning/ending valid times.
> > > >
> > > > If you find that you're running multiple jobs on the same
subset of
> > data
> > > > (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC, MPR to
CTS), it'd
> > be
> > > > more efficient to group those jobs into a config file.
That'll do
> the
> > > > filtering ONCE and write the filtered data to a temp file.
Then all
> > the
> > > > jobs read data from the temp instead of starting over from
scratch.
> > > >
> > > > Make sense?
> > > >
> > > > John
> > > >
> > > >
> > > >
> > > > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken - NOAA
Affiliate
> > via
> > > RT
> > > > <met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > > > >
> > > > > Hi John,
> > > > >
> > > > > That's actually only partially correct.  It's not that I
want to
> use
> > > part
> > > > > of the MPR lines and discard the rest, and I do need to
regenerate
> > > > > statistics.  Let me try to re-explain.
> > > > >
> > > > > Back in early March we switched from getting our ASCAT obs
from the
> > > > > prepbufr data, to getting it from the MGDRLITE data. So,
processing
> > > > didn't
> > > > > change.  I was producing statistics at certain threshold
levels for
> > > both
> > > > > GFS and ASCAT.  I had this set with the cat_thresh list, at
levels
> of
> > > > > 0,6,17, etc.  We found out after processing for a couple of
weeks
> > that
> > > > the
> > > > > ASCAT data included these really small values, <1.0 m/s, and
that
> > these
> > > > > small wind speeds were being included into the statistics
> processing.
> > > > >
> > > > > So, a couple of questions.
> > > > > 1) Do I have to regenerate all of my statistics (*.cts,
*.cnt and
> > *ctc
> > > > > files) because of this error? Or, since I have threshold
levels
> set,
> > > will
> > > > > those small values be amoung the statistics in the lowest
> thresholds?
> > > > > 2) I have the *.stat files, but, they are spread out into
separate
> > > > > directories like:
> > > > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> > > > > Can I tell stat-analysis to "lookin" directories with a
wildcard
> > (like
> > > > > 201803*)?  If so, how?  Or, is I tell it to look in
> /GFS/data/hourly,
> > > > will
> > > > > it look in all the directories recursively under hourly?
And, it
> > > that's
> > > > > the case, can I give it a date range, so, that it only
processes
> data
> > > > from
> > > > > March?
> > > > >
> > > > > Roz
> > > > >
> > > > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway via RT <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > > > Hi Roz,
> > > > > >
> > > > > > I read that you've run Point-Stat and saved off the
matched pairs
> > > (MPR)
> > > > > > output line type.  And you'd like to (1) filter those MPR
lines
> to
> > > > > discard
> > > > > > some of them and then (2) use the filtered data to
regenerate
> > summary
> > > > > > statistics.  Yes, this is easily done using the STAT-
Analysis
> tool
> > in
> > > > > MET.
> > > > > >
> > > > > > You wrote that you're verifying wind speeds against ASCAT
and
> that
> > > > you'd
> > > > > > like to exclude pairs where the observed wind speed is
less than
> 1
> > > m/s.
> > > > > > I'm just guessing here, but I'll presume that you want to
produce
> > > both
> > > > > > SL1L2 and CNT output line types.  Here's what the STAT-
Analysis
> job
> > > > would
> > > > > > look like:
> > > > > >
> > > > > > # Filter MPR's and write SL1L2 output line
> > > > > > stat_analysis \
> > > > > >    -lookin input.stat \            # List a .stat filename
or
> > > directory
> > > > > > containing them
> > > > > >    -job aggregate_stat \        # Job type is
aggregate_stat
> > > > > >    -line_type MPR \              # Input line type = MPR
> > > > > >    -out_line_type SL1L2 \      # Output line type = SL1L2
partial
> > > sums
> > > > > >    -fcst_var WIND \               # Only process lines
where
> > FCST_VAR
> > > > > > column = WIND
> > > > > >    -column_thresh OBS gt1 \ # Only use MPR lines where OBS
column
> > > 1
> > > > > >    -by
> > > > > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,
> > > INTERP_PNTS
> > > > #
> > > > > > Run this same job for each unique combination of these
columns
> > > > > >    -out_stat MPR_to_SL1L2.stat
> > > > > >
> > > > > > This will read produce an output .stat file containing an
SL1L2
> > line
> > > > for
> > > > > > each unique combination of the header columns listed after
the
> > "-by"
> > > > > > option.  To generate CNT output lines instead, you'd run a
second
> > job
> > > > > where
> > > > > > you replace SL1L2 with CNT.  You could run these jobs on
the
> > command
> > > > line
> > > > > > or group them together into a STAT-Analysis config file,
if you
> > > prefer.
> > > > > > Both would work.
> > > > > >
> > > > > > You could run this once for each input .stat file you're
> > > processing...
> > > > or
> > > > > > you could pass many input .stat files to the job.  Since
> > > FCST_INIT_BEG
> > > > > and
> > > > > > FCST_LEAD are included in the "-by" option, you'll get
separate
> > > output
> > > > > > lines for each unique time.
> > > > > >
> > > > > > Hope that helps get you going.
> > > > > >
> > > > > > Thanks,
> > > > > > John
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via RT <
> > > > > > met_help at ucar.edu>
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
> >
> > > > > > >
> > > > > > > Hi Roz.  My apologies for the delay in responding.
> > > > > > >
> > > > > > > Unfortunately, John is out of the office this week, and
I do
> not
> > > know
> > > > > the
> > > > > > > answers to your questions.  As you said, I would also
imagine
> > that
> > > > > > > point-stat is using those small values as matched pairs.
> Also, I
> > > do
> > > > > not
> > > > > > > believe there is a way to regenerate the point-stat
statistics
> > > > without
> > > > > > > using the original GFS data.  I cannot say with
certainty,
> > however.
> > > > > > Thank
> > > > > > > you for your patience in advance.  We'll get a definite
> response
> > to
> > > > you
> > > > > > as
> > > > > > > soon as we can.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Julie
> > > > > > >
> > > > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken -
NOAA
> > > Affiliate
> > > > > via
> > > > > > RT
> > > > > > > <met_help at ucar.edu> wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > Wed Apr 18 06:31:39 2018: Request 84822 was acted
upon.
> > > > > > > > Transaction: Ticket created by
rosalyn.maccracken at noaa.gov
> > > > > > > >        Queue: met_help
> > > > > > > >      Subject: question on regenerating data
> > > > > > > >        Owner: Nobody
> > > > > > > >   Requestors: rosalyn.maccracken at noaa.gov
> > > > > > > >       Status: new
> > > > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > > > Ticket/Display.html?id=84822
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm running point-stat using ASCAT and GFS data to
verify
> > surface
> > > > > wind
> > > > > > > > speeds.  I found an error in my ASCAT input data that
goes
> back
> > > to
> > > > > Mar
> > > > > > 7.
> > > > > > > > I had switched the input source of the data, and
within the
> new
> > > > data
> > > > > > > files,
> > > > > > > > it was allowing very small values (< 1 m/s) to be used
as
> data
> > > > points
> > > > > > in
> > > > > > > > the verification.  I imagine that this is an issue,
since
> > > > point-stat
> > > > > is
> > > > > > > > using these very small values as matched pairs with
the GFS,
> > > > correct?
> > > > > > > >
> > > > > > > > Is there a way to regenerate the point-stat statistics
> without
> > > > using
> > > > > > the
> > > > > > > > original GFS data?  I do have the *stat and the *mpr
files,
> and
> > > it
> > > > is
> > > > > > > > pretty easy to identify where the bad values are
located.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Roz
> > > > > > > >
> > > > > > > > --
> > > > > > > > Rosalyn MacCracken
> > > > > > > > Support Scientist
> > > > > > > >
> > > > > > > > Ocean Applications Branch
> > > > > > > > NOAA/NWS Ocean Prediction Center
> > > > > > > > NCWCP
> > > > > > > > 5830 University Research Ct
> > > > > > > > College Park, MD  20740-3818
> > > > > > > >
> > > > > > > > (p) 301-683-1551
> > > > > > > > rosalyn.maccracken at noaa.gov
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Rosalyn MacCracken
> > > > > Support Scientist
> > > > >
> > > > > Ocean Applications Branch
> > > > > NOAA/NWS Ocean Prediction Center
> > > > > NCWCP
> > > > > 5830 University Research Ct
> > > > > College Park, MD  20740-3818
> > > > >
> > > > > (p) 301-683-1551
> > > > > rosalyn.maccracken at noaa.gov
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Rosalyn MacCracken
> > > Support Scientist
> > >
> > > Ocean Applications Branch
> > > NOAA/NWS Ocean Prediction Center
> > > NCWCP
> > > 5830 University Research Ct
> > > College Park, MD  20740-3818
> > >
> > > (p) 301-683-1551
> > > rosalyn.maccracken at noaa.gov
> > >
> > >
> >
> >
>
>
> --
> Rosalyn MacCracken
> Support Scientist
>
> Ocean Applications Branch
> NOAA/NWS Ocean Prediction Center
> NCWCP
> 5830 University Research Ct
> College Park, MD  20740-3818
>
> (p) 301-683-1551
> rosalyn.maccracken at noaa.gov
>
>

------------------------------------------------
Subject: question on regenerating data
From: Rosalyn MacCracken - NOAA Affiliate
Time: Tue Apr 24 12:53:08 2018

Ok, I'll get that over to the ftp site.  I have to make sure that I
find a
day that has all the data in it.  Sometimes the data isn't available
when
the script runs.  A little annoying, but, that's operations...

I'll let you know when I get the file to the ftp site.

Thanks!

Roz

On Tue, Apr 24, 2018 at 2:49 PM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:

> Roz,
>
> Yes, we do.  Follow the instructions here:
>    https://dtcenter.org/met/users/support/met_help.php#ftp
>
> I'd suggest making a tar file for one day and posting them to the
ftp site:
>    tar -cvzf sample.tar.gz /GFS/data/hourly/20180305*
>
> Thanks,
> John
>
> On Tue, Apr 24, 2018 at 11:57 AM, Rosalyn MacCracken - NOAA
Affiliate via
> RT <met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> >
> > HI John,
> >
> > Yes, it does seem that the -config option is the way to go to
recreate
> > those 3 files. I'll be sure to have a unique file name, or, mv the
output
> > file to a different name before running the command again.  Thanks
for
> > pointing that out.
> >
> > I'm teleworking for the next couple of weeks, so, download and
send you
> > *.stat files like I can when I'm at my computer at work.  I don't
have
> > access to theia or wcoss anymore.  You have an ftp server that I
can
> upload
> > data to, right?  If not, I can try and fiddle around with this
tomorrow
> and
> > see if I can't get this to work the way I want to.
> >
> > Roz
> >
> > On Tue, Apr 24, 2018 at 11:42 AM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> >
> > > Roz,
> > >
> > > Each "-job aggregate_stat" only generates a single output line
type.
> So
> > > using "-out_line_type CTC,CTS,CNT" will not work.
> > >
> > > You'll need to run separate jobs for each output line type you
want to
> > > generate.  That's why I'd recommend grouping those multiple jobs
> together
> > > into a single STAT-Analysis config file.  Then you'd call STAT-
Analysis
> > > once using the "-config" command line option.
> > >
> > > Another issue is that if you set "-out_stat" to the same
filename,
> it'll
> > > get overridden by each job.  STAT-Analysis will overwrite that
output
> > file
> > > rather than appending to it.
> > >
> > > You could send me a day's worth of .stat output files
> > > (/GFS/data/hourly/20180305*) and I could send you some
suggestions.  Or
> > if
> > > you have access to theia you could copy them up there and point
me to
> it.
> > >
> > > Thanks,
> > > John
> > >
> > > On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken - NOAA
Affiliate
> via
> > RT
> > > <met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
>
> > > >
> > > > Hi John,
> > > >
> > > > Yes, that makes sense.  Those very small values (<1.0 m/s),
are bad
> > > > values.  That's why they shouldn't be included in the
processing.
> > > >
> > > > So, I need to just regenerate hourly data, one hour at a time.
Would
> > it
> > > > make sense to use a shell script and loop stat-analysis?
Something
> > like:
> > > >
> > > > for day in 11 12
> > > > do
> > > >   for cycle in 00 06 12 18
> > > >   do
> > > > stat_analysis -lookin
/GFS/data/hourly/201803${day}${hour}/*.stat \
> > > > -job aggregate_stat \
> > > >    -line_type MPR \
> > > >    -out_line_type CTC,CTS,CNT \
> > > >   -fcst_var WIND \
> > > > -column_thresh OBS gt1 \
> > > >  -by
> > > > MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,
> INTERP_PNTS
> > > > -out_stat /new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
> > > >   done
> > > > done
> > > >
> > > > or, something like that?  And, will this regenerate hour
forecasts,
> at
> > > each
> > > > forecast and lead hour?  I guess it will see the forecast and
lead
> hour
> > > > from the *.stat file, and whatever *stat file is in the
directory, it
> > > will
> > > > regenerate those hours, right?
> > > >
> > > > So, I need to regenerate the CTC, CNT and CTS files.  That's
why I
> did:
> > > >  -out_line_type CTC,CTS,CNT
> > > > but, will that make 3 separate files, or just another *.stat
file?
> > > >
> > > > Roz
> > > >
> > > >
> > > > On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > > Roz,
> > > > >
> > > > > It is ultimately up to you to decide which matched pairs you
want
> to
> > > > > include in your processing.  Do you consider those small
(<1.0 m/s)
> > > > > observation values to be corrupt and incorrect in some way
or just
> > not
> > > > very
> > > > > interesting?  If they really are BAD data values, I agree
that you
> > > should
> > > > > exclude them from your analysis.  But if they're just
uninteresting
> > > > values
> > > > > of low wind speed, then there's no reason why you should
exclude
> > them.
> > > > For
> > > > > example, *most* of the time it ins't raining, but we often
included
> > > > > observations of 0 precip.
> > > > >
> > > > > There are three configurable options in Point-Stat that may
be
> useful
> > > > here:
> > > > > (1) You already know and use the "cat_thresh" option.  This
> threshold
> > > > > defines the events and non-events for a 2x2 contingency
table.
> This
> > > > > threshold affects the contents of FHO, CTC, CTS, MCTC, and
MCTS
> line
> > > > types
> > > > > that Point-Stat writes.
> > > > > (2) The "cnt_thresh" option is a more recent addition.
Perhaps
> this
> > > was
> > > > a
> > > > > poor name choice, but instead of defining categories, it's
really a
> > > > > *filtering* threshold.  This threshold affects the contents
of the
> > > SL1L2,
> > > > > SAL1L2, and CNT line types that Point-Stat writes.  For
example,
> > > setting
> > > > > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and 2 SL1L2
output
> > > lines
> > > > > containing only those points where the wind speed was >=6
and >=17,
> > > > > respectively.
> > > > > (3) The "wind_thresh" option is very similar to the
"cnt_thresh"
> > option
> > > > but
> > > > > affects the contents of teh VL1L2, VAL1L2, and VCNT (new in
> met-7.0)
> > > line
> > > > > types.  Only those U/V pairs that meet the specified wind
speed
> > > threshold
> > > > > are included in the output.
> > > > >
> > > > > For both "cnt_thresh" and "wind_thresh", the default value
in the
> > > config
> > > > > file is "NA", meaning, do not apply any filtering threshold
> criteria.
> > > > >
> > > > > You have the flexibility to run STAT-Analysis on the MPR
output
> lines
> > > to
> > > > > recompute any of these output line types applying whatever
> filtering
> > > > > criteria you'd like.
> > > > > Here's the MET user's guide:
> > > > > https://dtcenter.org/met/users/docs/users_guide/MET_
> > > Users_Guide_v7.0.pdf
> > > > > Look on page 98 for the job command options for the
> "aggregate_stat"
> > > line
> > > > > type when the input line type is "MPR".
> > > > >
> > > > > For your second question, the "-lookin PATH" option is
*VERY*
> > flexible.
> > > > > You can set PATH to either a single value or multiple
values.  If
> you
> > > use
> > > > > wildcards, then the shell expands those wildcards to
multiple
> values.
> > > > Each
> > > > > value you pass in can either be a filename or a directory
name.  If
> > you
> > > > > pass in a filename, STAT-Analysis will read it *REGARDLESS*
of the
> > file
> > > > > extension.  If you pass in a directory name, STAT-Analysis
will
> > search
> > > > that
> > > > > directory *RECURSIVELY* for files ending in ".stat".  For
example,
> > > either
> > > > > of the following settings would tell STAT-Analysis to read
the same
> > > list
> > > > of
> > > > > files:
> > > > >    -lookin /GFS/data/hourly/*/*.stat
> > > > >    ... or ...
> > > > >    -lookin /GFS/data/hourly
> > > > >
> > > > > Be aware though that the more data you pass to STAT-
Analysis, the
> > > longer
> > > > > it'll take for it to process it.  You can decide how much
data you
> > pass
> > > > it
> > > > > for each job.  I'd suggest starting with what is most
convenient
> for
> > > you.
> > > > > If it's too slow, change the logic to pass it less data
(e.g. only
> 1
> > > day
> > > > of
> > > > > data rather than 1 month of data).
> > > > >
> > > > > Yes, you can give it a date range.  Use -fcst_init_beg and
> > > -fcst_init_end
> > > > > to specify beginning/ending model initialization times or
> > > -fcst_valid_beg
> > > > > and -fcst_valid_end to specify beginning/ending valid times.
> > > > >
> > > > > If you find that you're running multiple jobs on the same
subset of
> > > data
> > > > > (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC, MPR to
CTS),
> it'd
> > > be
> > > > > more efficient to group those jobs into a config file.
That'll do
> > the
> > > > > filtering ONCE and write the filtered data to a temp file.
Then
> all
> > > the
> > > > > jobs read data from the temp instead of starting over from
scratch.
> > > > >
> > > > > Make sense?
> > > > >
> > > > > John
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken - NOAA
> Affiliate
> > > via
> > > > RT
> > > > > <met_help at ucar.edu> wrote:
> > > > >
> > > > > >
> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > > > > >
> > > > > > Hi John,
> > > > > >
> > > > > > That's actually only partially correct.  It's not that I
want to
> > use
> > > > part
> > > > > > of the MPR lines and discard the rest, and I do need to
> regenerate
> > > > > > statistics.  Let me try to re-explain.
> > > > > >
> > > > > > Back in early March we switched from getting our ASCAT obs
from
> the
> > > > > > prepbufr data, to getting it from the MGDRLITE data. So,
> processing
> > > > > didn't
> > > > > > change.  I was producing statistics at certain threshold
levels
> for
> > > > both
> > > > > > GFS and ASCAT.  I had this set with the cat_thresh list,
at
> levels
> > of
> > > > > > 0,6,17, etc.  We found out after processing for a couple
of weeks
> > > that
> > > > > the
> > > > > > ASCAT data included these really small values, <1.0 m/s,
and that
> > > these
> > > > > > small wind speeds were being included into the statistics
> > processing.
> > > > > >
> > > > > > So, a couple of questions.
> > > > > > 1) Do I have to regenerate all of my statistics (*.cts,
*.cnt and
> > > *ctc
> > > > > > files) because of this error? Or, since I have threshold
levels
> > set,
> > > > will
> > > > > > those small values be amoung the statistics in the lowest
> > thresholds?
> > > > > > 2) I have the *.stat files, but, they are spread out into
> separate
> > > > > > directories like:
> > > > > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> > > > > > Can I tell stat-analysis to "lookin" directories with a
wildcard
> > > (like
> > > > > > 201803*)?  If so, how?  Or, is I tell it to look in
> > /GFS/data/hourly,
> > > > > will
> > > > > > it look in all the directories recursively under hourly?
And, it
> > > > that's
> > > > > > the case, can I give it a date range, so, that it only
processes
> > data
> > > > > from
> > > > > > March?
> > > > > >
> > > > > > Roz
> > > > > >
> > > > > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway via RT
<
> > > > > > met_help at ucar.edu> wrote:
> > > > > >
> > > > > > > Hi Roz,
> > > > > > >
> > > > > > > I read that you've run Point-Stat and saved off the
matched
> pairs
> > > > (MPR)
> > > > > > > output line type.  And you'd like to (1) filter those
MPR lines
> > to
> > > > > > discard
> > > > > > > some of them and then (2) use the filtered data to
regenerate
> > > summary
> > > > > > > statistics.  Yes, this is easily done using the STAT-
Analysis
> > tool
> > > in
> > > > > > MET.
> > > > > > >
> > > > > > > You wrote that you're verifying wind speeds against
ASCAT and
> > that
> > > > > you'd
> > > > > > > like to exclude pairs where the observed wind speed is
less
> than
> > 1
> > > > m/s.
> > > > > > > I'm just guessing here, but I'll presume that you want
to
> produce
> > > > both
> > > > > > > SL1L2 and CNT output line types.  Here's what the STAT-
Analysis
> > job
> > > > > would
> > > > > > > look like:
> > > > > > >
> > > > > > > # Filter MPR's and write SL1L2 output line
> > > > > > > stat_analysis \
> > > > > > >    -lookin input.stat \            # List a .stat
filename or
> > > > directory
> > > > > > > containing them
> > > > > > >    -job aggregate_stat \        # Job type is
aggregate_stat
> > > > > > >    -line_type MPR \              # Input line type = MPR
> > > > > > >    -out_line_type SL1L2 \      # Output line type =
SL1L2
> partial
> > > > sums
> > > > > > >    -fcst_var WIND \               # Only process lines
where
> > > FCST_VAR
> > > > > > > column = WIND
> > > > > > >    -column_thresh OBS gt1 \ # Only use MPR lines where
OBS
> column
> > > > 1
> > > > > > >    -by
> > > > > > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,
> > > > INTERP_PNTS
> > > > > #
> > > > > > > Run this same job for each unique combination of these
columns
> > > > > > >    -out_stat MPR_to_SL1L2.stat
> > > > > > >
> > > > > > > This will read produce an output .stat file containing
an SL1L2
> > > line
> > > > > for
> > > > > > > each unique combination of the header columns listed
after the
> > > "-by"
> > > > > > > option.  To generate CNT output lines instead, you'd run
a
> second
> > > job
> > > > > > where
> > > > > > > you replace SL1L2 with CNT.  You could run these jobs on
the
> > > command
> > > > > line
> > > > > > > or group them together into a STAT-Analysis config file,
if you
> > > > prefer.
> > > > > > > Both would work.
> > > > > > >
> > > > > > > You could run this once for each input .stat file you're
> > > > processing...
> > > > > or
> > > > > > > you could pass many input .stat files to the job.  Since
> > > > FCST_INIT_BEG
> > > > > > and
> > > > > > > FCST_LEAD are included in the "-by" option, you'll get
separate
> > > > output
> > > > > > > lines for each unique time.
> > > > > > >
> > > > > > > Hope that helps get you going.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > John
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via RT
<
> > > > > > > met_help at ucar.edu>
> > > > > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > <URL: https://rt.rap.ucar.edu/rt/
> Ticket/Display.html?id=84822
> > >
> > > > > > > >
> > > > > > > > Hi Roz.  My apologies for the delay in responding.
> > > > > > > >
> > > > > > > > Unfortunately, John is out of the office this week,
and I do
> > not
> > > > know
> > > > > > the
> > > > > > > > answers to your questions.  As you said, I would also
imagine
> > > that
> > > > > > > > point-stat is using those small values as matched
pairs.
> > Also, I
> > > > do
> > > > > > not
> > > > > > > > believe there is a way to regenerate the point-stat
> statistics
> > > > > without
> > > > > > > > using the original GFS data.  I cannot say with
certainty,
> > > however.
> > > > > > > Thank
> > > > > > > > you for your patience in advance.  We'll get a
definite
> > response
> > > to
> > > > > you
> > > > > > > as
> > > > > > > > soon as we can.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Julie
> > > > > > > >
> > > > > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken -
NOAA
> > > > Affiliate
> > > > > > via
> > > > > > > RT
> > > > > > > > <met_help at ucar.edu> wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Wed Apr 18 06:31:39 2018: Request 84822 was acted
upon.
> > > > > > > > > Transaction: Ticket created by
rosalyn.maccracken at noaa.gov
> > > > > > > > >        Queue: met_help
> > > > > > > > >      Subject: question on regenerating data
> > > > > > > > >        Owner: Nobody
> > > > > > > > >   Requestors: rosalyn.maccracken at noaa.gov
> > > > > > > > >       Status: new
> > > > > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > > > > Ticket/Display.html?id=84822
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I'm running point-stat using ASCAT and GFS data to
verify
> > > surface
> > > > > > wind
> > > > > > > > > speeds.  I found an error in my ASCAT input data
that goes
> > back
> > > > to
> > > > > > Mar
> > > > > > > 7.
> > > > > > > > > I had switched the input source of the data, and
within the
> > new
> > > > > data
> > > > > > > > files,
> > > > > > > > > it was allowing very small values (< 1 m/s) to be
used as
> > data
> > > > > points
> > > > > > > in
> > > > > > > > > the verification.  I imagine that this is an issue,
since
> > > > > point-stat
> > > > > > is
> > > > > > > > > using these very small values as matched pairs with
the
> GFS,
> > > > > correct?
> > > > > > > > >
> > > > > > > > > Is there a way to regenerate the point-stat
statistics
> > without
> > > > > using
> > > > > > > the
> > > > > > > > > original GFS data?  I do have the *stat and the *mpr
files,
> > and
> > > > it
> > > > > is
> > > > > > > > > pretty easy to identify where the bad values are
located.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Roz
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Rosalyn MacCracken
> > > > > > > > > Support Scientist
> > > > > > > > >
> > > > > > > > > Ocean Applications Branch
> > > > > > > > > NOAA/NWS Ocean Prediction Center
> > > > > > > > > NCWCP
> > > > > > > > > 5830 University Research Ct
> > > > > > > > > College Park, MD  20740-3818
> > > > > > > > >
> > > > > > > > > (p) 301-683-1551
> > > > > > > > > rosalyn.maccracken at noaa.gov
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Rosalyn MacCracken
> > > > > > Support Scientist
> > > > > >
> > > > > > Ocean Applications Branch
> > > > > > NOAA/NWS Ocean Prediction Center
> > > > > > NCWCP
> > > > > > 5830 University Research Ct
> > > > > > College Park, MD  20740-3818
> > > > > >
> > > > > > (p) 301-683-1551
> > > > > > rosalyn.maccracken at noaa.gov
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Rosalyn MacCracken
> > > > Support Scientist
> > > >
> > > > Ocean Applications Branch
> > > > NOAA/NWS Ocean Prediction Center
> > > > NCWCP
> > > > 5830 University Research Ct
> > > > College Park, MD  20740-3818
> > > >
> > > > (p) 301-683-1551
> > > > rosalyn.maccracken at noaa.gov
> > > >
> > > >
> > >
> > >
> >
> >
> > --
> > Rosalyn MacCracken
> > Support Scientist
> >
> > Ocean Applications Branch
> > NOAA/NWS Ocean Prediction Center
> > NCWCP
> > 5830 University Research Ct
> > College Park, MD  20740-3818
> >
> > (p) 301-683-1551
> > rosalyn.maccracken at noaa.gov
> >
> >
>
>


--
Rosalyn MacCracken
Support Scientist

Ocean Applications Branch
NOAA/NWS Ocean Prediction Center
NCWCP
5830 University Research Ct
College Park, MD  20740-3818

(p) 301-683-1551
rosalyn.maccracken at noaa.gov

------------------------------------------------
Subject: question on regenerating data
From: Rosalyn MacCracken - NOAA Affiliate
Time: Tue Apr 24 14:09:37 2018

Hi John,

I put my file on the ftp site.  Let me know what you find.  You'll see
those really low OBS values (0.01, 0.02, and so on).

Thanks!

Roz

On Tue, Apr 24, 2018 at 2:53 PM, Rosalyn MacCracken - NOAA Affiliate <
rosalyn.maccracken at noaa.gov> wrote:

> Ok, I'll get that over to the ftp site.  I have to make sure that I
find a
> day that has all the data in it.  Sometimes the data isn't available
when
> the script runs.  A little annoying, but, that's operations...
>
> I'll let you know when I get the file to the ftp site.
>
> Thanks!
>
> Roz
>
> On Tue, Apr 24, 2018 at 2:49 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
>> Roz,
>>
>> Yes, we do.  Follow the instructions here:
>>    https://dtcenter.org/met/users/support/met_help.php#ftp
>>
>> I'd suggest making a tar file for one day and posting them to the
ftp
>> site:
>>    tar -cvzf sample.tar.gz /GFS/data/hourly/20180305*
>>
>> Thanks,
>> John
>>
>> On Tue, Apr 24, 2018 at 11:57 AM, Rosalyn MacCracken - NOAA
Affiliate via
>> RT <met_help at ucar.edu> wrote:
>>
>> >
>> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>> >
>> > HI John,
>> >
>> > Yes, it does seem that the -config option is the way to go to
recreate
>> > those 3 files. I'll be sure to have a unique file name, or, mv
the
>> output
>> > file to a different name before running the command again.
Thanks for
>> > pointing that out.
>> >
>> > I'm teleworking for the next couple of weeks, so, download and
send you
>> > *.stat files like I can when I'm at my computer at work.  I don't
have
>> > access to theia or wcoss anymore.  You have an ftp server that I
can
>> upload
>> > data to, right?  If not, I can try and fiddle around with this
tomorrow
>> and
>> > see if I can't get this to work the way I want to.
>> >
>> > Roz
>> >
>> > On Tue, Apr 24, 2018 at 11:42 AM, John Halley Gotway via RT <
>> > met_help at ucar.edu> wrote:
>> >
>> > > Roz,
>> > >
>> > > Each "-job aggregate_stat" only generates a single output line
type.
>> So
>> > > using "-out_line_type CTC,CTS,CNT" will not work.
>> > >
>> > > You'll need to run separate jobs for each output line type you
want to
>> > > generate.  That's why I'd recommend grouping those multiple
jobs
>> together
>> > > into a single STAT-Analysis config file.  Then you'd call
>> STAT-Analysis
>> > > once using the "-config" command line option.
>> > >
>> > > Another issue is that if you set "-out_stat" to the same
filename,
>> it'll
>> > > get overridden by each job.  STAT-Analysis will overwrite that
output
>> > file
>> > > rather than appending to it.
>> > >
>> > > You could send me a day's worth of .stat output files
>> > > (/GFS/data/hourly/20180305*) and I could send you some
suggestions.
>> Or
>> > if
>> > > you have access to theia you could copy them up there and point
me to
>> it.
>> > >
>> > > Thanks,
>> > > John
>> > >
>> > > On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken - NOAA
Affiliate
>> via
>> > RT
>> > > <met_help at ucar.edu> wrote:
>> > >
>> > > >
>> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
>
>> > > >
>> > > > Hi John,
>> > > >
>> > > > Yes, that makes sense.  Those very small values (<1.0 m/s),
are bad
>> > > > values.  That's why they shouldn't be included in the
processing.
>> > > >
>> > > > So, I need to just regenerate hourly data, one hour at a
time.
>> Would
>> > it
>> > > > make sense to use a shell script and loop stat-analysis?
Something
>> > like:
>> > > >
>> > > > for day in 11 12
>> > > > do
>> > > >   for cycle in 00 06 12 18
>> > > >   do
>> > > > stat_analysis -lookin
/GFS/data/hourly/201803${day}${hour}/*.stat \
>> > > > -job aggregate_stat \
>> > > >    -line_type MPR \
>> > > >    -out_line_type CTC,CTS,CNT \
>> > > >   -fcst_var WIND \
>> > > > -column_thresh OBS gt1 \
>> > > >  -by
>> > > > MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,I
>> NTERP_PNTS
>> > > > -out_stat /new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
>> > > >   done
>> > > > done
>> > > >
>> > > > or, something like that?  And, will this regenerate hour
forecasts,
>> at
>> > > each
>> > > > forecast and lead hour?  I guess it will see the forecast and
lead
>> hour
>> > > > from the *.stat file, and whatever *stat file is in the
directory,
>> it
>> > > will
>> > > > regenerate those hours, right?
>> > > >
>> > > > So, I need to regenerate the CTC, CNT and CTS files.  That's
why I
>> did:
>> > > >  -out_line_type CTC,CTS,CNT
>> > > > but, will that make 3 separate files, or just another *.stat
file?
>> > > >
>> > > > Roz
>> > > >
>> > > >
>> > > > On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway via RT <
>> > > > met_help at ucar.edu> wrote:
>> > > >
>> > > > > Roz,
>> > > > >
>> > > > > It is ultimately up to you to decide which matched pairs
you want
>> to
>> > > > > include in your processing.  Do you consider those small
(<1.0
>> m/s)
>> > > > > observation values to be corrupt and incorrect in some way
or just
>> > not
>> > > > very
>> > > > > interesting?  If they really are BAD data values, I agree
that you
>> > > should
>> > > > > exclude them from your analysis.  But if they're just
>> uninteresting
>> > > > values
>> > > > > of low wind speed, then there's no reason why you should
exclude
>> > them.
>> > > > For
>> > > > > example, *most* of the time it ins't raining, but we often
>> included
>> > > > > observations of 0 precip.
>> > > > >
>> > > > > There are three configurable options in Point-Stat that may
be
>> useful
>> > > > here:
>> > > > > (1) You already know and use the "cat_thresh" option.  This
>> threshold
>> > > > > defines the events and non-events for a 2x2 contingency
table.
>> This
>> > > > > threshold affects the contents of FHO, CTC, CTS, MCTC, and
MCTS
>> line
>> > > > types
>> > > > > that Point-Stat writes.
>> > > > > (2) The "cnt_thresh" option is a more recent addition.
Perhaps
>> this
>> > > was
>> > > > a
>> > > > > poor name choice, but instead of defining categories, it's
really
>> a
>> > > > > *filtering* threshold.  This threshold affects the contents
of the
>> > > SL1L2,
>> > > > > SAL1L2, and CNT line types that Point-Stat writes.  For
example,
>> > > setting
>> > > > > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and 2
SL1L2
>> output
>> > > lines
>> > > > > containing only those points where the wind speed was >=6
and
>> >=17,
>> > > > > respectively.
>> > > > > (3) The "wind_thresh" option is very similar to the
"cnt_thresh"
>> > option
>> > > > but
>> > > > > affects the contents of teh VL1L2, VAL1L2, and VCNT (new in
>> met-7.0)
>> > > line
>> > > > > types.  Only those U/V pairs that meet the specified wind
speed
>> > > threshold
>> > > > > are included in the output.
>> > > > >
>> > > > > For both "cnt_thresh" and "wind_thresh", the default value
in the
>> > > config
>> > > > > file is "NA", meaning, do not apply any filtering threshold
>> criteria.
>> > > > >
>> > > > > You have the flexibility to run STAT-Analysis on the MPR
output
>> lines
>> > > to
>> > > > > recompute any of these output line types applying whatever
>> filtering
>> > > > > criteria you'd like.
>> > > > > Here's the MET user's guide:
>> > > > > https://dtcenter.org/met/users/docs/users_guide/MET_
>> > > Users_Guide_v7.0.pdf
>> > > > > Look on page 98 for the job command options for the
>> "aggregate_stat"
>> > > line
>> > > > > type when the input line type is "MPR".
>> > > > >
>> > > > > For your second question, the "-lookin PATH" option is
*VERY*
>> > flexible.
>> > > > > You can set PATH to either a single value or multiple
values.  If
>> you
>> > > use
>> > > > > wildcards, then the shell expands those wildcards to
multiple
>> values.
>> > > > Each
>> > > > > value you pass in can either be a filename or a directory
name.
>> If
>> > you
>> > > > > pass in a filename, STAT-Analysis will read it *REGARDLESS*
of the
>> > file
>> > > > > extension.  If you pass in a directory name, STAT-Analysis
will
>> > search
>> > > > that
>> > > > > directory *RECURSIVELY* for files ending in ".stat".  For
example,
>> > > either
>> > > > > of the following settings would tell STAT-Analysis to read
the
>> same
>> > > list
>> > > > of
>> > > > > files:
>> > > > >    -lookin /GFS/data/hourly/*/*.stat
>> > > > >    ... or ...
>> > > > >    -lookin /GFS/data/hourly
>> > > > >
>> > > > > Be aware though that the more data you pass to STAT-
Analysis, the
>> > > longer
>> > > > > it'll take for it to process it.  You can decide how much
data you
>> > pass
>> > > > it
>> > > > > for each job.  I'd suggest starting with what is most
convenient
>> for
>> > > you.
>> > > > > If it's too slow, change the logic to pass it less data
(e.g.
>> only 1
>> > > day
>> > > > of
>> > > > > data rather than 1 month of data).
>> > > > >
>> > > > > Yes, you can give it a date range.  Use -fcst_init_beg and
>> > > -fcst_init_end
>> > > > > to specify beginning/ending model initialization times or
>> > > -fcst_valid_beg
>> > > > > and -fcst_valid_end to specify beginning/ending valid
times.
>> > > > >
>> > > > > If you find that you're running multiple jobs on the same
subset
>> of
>> > > data
>> > > > > (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC, MPR to
CTS),
>> it'd
>> > > be
>> > > > > more efficient to group those jobs into a config file.
That'll do
>> > the
>> > > > > filtering ONCE and write the filtered data to a temp file.
Then
>> all
>> > > the
>> > > > > jobs read data from the temp instead of starting over from
>> scratch.
>> > > > >
>> > > > > Make sense?
>> > > > >
>> > > > > John
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken - NOAA
>> Affiliate
>> > > via
>> > > > RT
>> > > > > <met_help at ucar.edu> wrote:
>> > > > >
>> > > > > >
>> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>> > > > > >
>> > > > > > Hi John,
>> > > > > >
>> > > > > > That's actually only partially correct.  It's not that I
want to
>> > use
>> > > > part
>> > > > > > of the MPR lines and discard the rest, and I do need to
>> regenerate
>> > > > > > statistics.  Let me try to re-explain.
>> > > > > >
>> > > > > > Back in early March we switched from getting our ASCAT
obs from
>> the
>> > > > > > prepbufr data, to getting it from the MGDRLITE data. So,
>> processing
>> > > > > didn't
>> > > > > > change.  I was producing statistics at certain threshold
levels
>> for
>> > > > both
>> > > > > > GFS and ASCAT.  I had this set with the cat_thresh list,
at
>> levels
>> > of
>> > > > > > 0,6,17, etc.  We found out after processing for a couple
of
>> weeks
>> > > that
>> > > > > the
>> > > > > > ASCAT data included these really small values, <1.0 m/s,
and
>> that
>> > > these
>> > > > > > small wind speeds were being included into the statistics
>> > processing.
>> > > > > >
>> > > > > > So, a couple of questions.
>> > > > > > 1) Do I have to regenerate all of my statistics (*.cts,
*.cnt
>> and
>> > > *ctc
>> > > > > > files) because of this error? Or, since I have threshold
levels
>> > set,
>> > > > will
>> > > > > > those small values be amoung the statistics in the lowest
>> > thresholds?
>> > > > > > 2) I have the *.stat files, but, they are spread out into
>> separate
>> > > > > > directories like:
>> > > > > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
>> > > > > > Can I tell stat-analysis to "lookin" directories with a
wildcard
>> > > (like
>> > > > > > 201803*)?  If so, how?  Or, is I tell it to look in
>> > /GFS/data/hourly,
>> > > > > will
>> > > > > > it look in all the directories recursively under hourly?
And,
>> it
>> > > > that's
>> > > > > > the case, can I give it a date range, so, that it only
processes
>> > data
>> > > > > from
>> > > > > > March?
>> > > > > >
>> > > > > > Roz
>> > > > > >
>> > > > > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway via
RT <
>> > > > > > met_help at ucar.edu> wrote:
>> > > > > >
>> > > > > > > Hi Roz,
>> > > > > > >
>> > > > > > > I read that you've run Point-Stat and saved off the
matched
>> pairs
>> > > > (MPR)
>> > > > > > > output line type.  And you'd like to (1) filter those
MPR
>> lines
>> > to
>> > > > > > discard
>> > > > > > > some of them and then (2) use the filtered data to
regenerate
>> > > summary
>> > > > > > > statistics.  Yes, this is easily done using the STAT-
Analysis
>> > tool
>> > > in
>> > > > > > MET.
>> > > > > > >
>> > > > > > > You wrote that you're verifying wind speeds against
ASCAT and
>> > that
>> > > > > you'd
>> > > > > > > like to exclude pairs where the observed wind speed is
less
>> than
>> > 1
>> > > > m/s.
>> > > > > > > I'm just guessing here, but I'll presume that you want
to
>> produce
>> > > > both
>> > > > > > > SL1L2 and CNT output line types.  Here's what the
>> STAT-Analysis
>> > job
>> > > > > would
>> > > > > > > look like:
>> > > > > > >
>> > > > > > > # Filter MPR's and write SL1L2 output line
>> > > > > > > stat_analysis \
>> > > > > > >    -lookin input.stat \            # List a .stat
filename or
>> > > > directory
>> > > > > > > containing them
>> > > > > > >    -job aggregate_stat \        # Job type is
aggregate_stat
>> > > > > > >    -line_type MPR \              # Input line type =
MPR
>> > > > > > >    -out_line_type SL1L2 \      # Output line type =
SL1L2
>> partial
>> > > > sums
>> > > > > > >    -fcst_var WIND \               # Only process lines
where
>> > > FCST_VAR
>> > > > > > > column = WIND
>> > > > > > >    -column_thresh OBS gt1 \ # Only use MPR lines where
OBS
>> column
>> > > > 1
>> > > > > > >    -by
>> > > > > > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,
>> > > > INTERP_PNTS
>> > > > > #
>> > > > > > > Run this same job for each unique combination of these
columns
>> > > > > > >    -out_stat MPR_to_SL1L2.stat
>> > > > > > >
>> > > > > > > This will read produce an output .stat file containing
an
>> SL1L2
>> > > line
>> > > > > for
>> > > > > > > each unique combination of the header columns listed
after the
>> > > "-by"
>> > > > > > > option.  To generate CNT output lines instead, you'd
run a
>> second
>> > > job
>> > > > > > where
>> > > > > > > you replace SL1L2 with CNT.  You could run these jobs
on the
>> > > command
>> > > > > line
>> > > > > > > or group them together into a STAT-Analysis config
file, if
>> you
>> > > > prefer.
>> > > > > > > Both would work.
>> > > > > > >
>> > > > > > > You could run this once for each input .stat file
you're
>> > > > processing...
>> > > > > or
>> > > > > > > you could pass many input .stat files to the job.
Since
>> > > > FCST_INIT_BEG
>> > > > > > and
>> > > > > > > FCST_LEAD are included in the "-by" option, you'll get
>> separate
>> > > > output
>> > > > > > > lines for each unique time.
>> > > > > > >
>> > > > > > > Hope that helps get you going.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > John
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via
RT <
>> > > > > > > met_help at ucar.edu>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > >
>> > > > > > > > <URL: https://rt.rap.ucar.edu/rt/Tic
>> ket/Display.html?id=84822
>> > >
>> > > > > > > >
>> > > > > > > > Hi Roz.  My apologies for the delay in responding.
>> > > > > > > >
>> > > > > > > > Unfortunately, John is out of the office this week,
and I do
>> > not
>> > > > know
>> > > > > > the
>> > > > > > > > answers to your questions.  As you said, I would also
>> imagine
>> > > that
>> > > > > > > > point-stat is using those small values as matched
pairs.
>> > Also, I
>> > > > do
>> > > > > > not
>> > > > > > > > believe there is a way to regenerate the point-stat
>> statistics
>> > > > > without
>> > > > > > > > using the original GFS data.  I cannot say with
certainty,
>> > > however.
>> > > > > > > Thank
>> > > > > > > > you for your patience in advance.  We'll get a
definite
>> > response
>> > > to
>> > > > > you
>> > > > > > > as
>> > > > > > > > soon as we can.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Julie
>> > > > > > > >
>> > > > > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken -
NOAA
>> > > > Affiliate
>> > > > > > via
>> > > > > > > RT
>> > > > > > > > <met_help at ucar.edu> wrote:
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Wed Apr 18 06:31:39 2018: Request 84822 was acted
upon.
>> > > > > > > > > Transaction: Ticket created by
>> rosalyn.maccracken at noaa.gov
>> > > > > > > > >        Queue: met_help
>> > > > > > > > >      Subject: question on regenerating data
>> > > > > > > > >        Owner: Nobody
>> > > > > > > > >   Requestors: rosalyn.maccracken at noaa.gov
>> > > > > > > > >       Status: new
>> > > > > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
>> > > > > > Ticket/Display.html?id=84822
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Hi,
>> > > > > > > > >
>> > > > > > > > > I'm running point-stat using ASCAT and GFS data to
verify
>> > > surface
>> > > > > > wind
>> > > > > > > > > speeds.  I found an error in my ASCAT input data
that goes
>> > back
>> > > > to
>> > > > > > Mar
>> > > > > > > 7.
>> > > > > > > > > I had switched the input source of the data, and
within
>> the
>> > new
>> > > > > data
>> > > > > > > > files,
>> > > > > > > > > it was allowing very small values (< 1 m/s) to be
used as
>> > data
>> > > > > points
>> > > > > > > in
>> > > > > > > > > the verification.  I imagine that this is an issue,
since
>> > > > > point-stat
>> > > > > > is
>> > > > > > > > > using these very small values as matched pairs with
the
>> GFS,
>> > > > > correct?
>> > > > > > > > >
>> > > > > > > > > Is there a way to regenerate the point-stat
statistics
>> > without
>> > > > > using
>> > > > > > > the
>> > > > > > > > > original GFS data?  I do have the *stat and the
*mpr
>> files,
>> > and
>> > > > it
>> > > > > is
>> > > > > > > > > pretty easy to identify where the bad values are
located.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > Roz
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Rosalyn MacCracken
>> > > > > > > > > Support Scientist
>> > > > > > > > >
>> > > > > > > > > Ocean Applications Branch
>> > > > > > > > > NOAA/NWS Ocean Prediction Center
>> > > > > > > > > NCWCP
>> > > > > > > > > 5830 University Research Ct
>> > > > > > > > > College Park, MD  20740-3818
>> > > > > > > > >
>> > > > > > > > > (p) 301-683-1551
>> > > > > > > > > rosalyn.maccracken at noaa.gov
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Rosalyn MacCracken
>> > > > > > Support Scientist
>> > > > > >
>> > > > > > Ocean Applications Branch
>> > > > > > NOAA/NWS Ocean Prediction Center
>> > > > > > NCWCP
>> > > > > > 5830 University Research Ct
>> > > > > > College Park, MD  20740-3818
>> > > > > >
>> > > > > > (p) 301-683-1551
>> > > > > > rosalyn.maccracken at noaa.gov
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Rosalyn MacCracken
>> > > > Support Scientist
>> > > >
>> > > > Ocean Applications Branch
>> > > > NOAA/NWS Ocean Prediction Center
>> > > > NCWCP
>> > > > 5830 University Research Ct
>> > > > College Park, MD  20740-3818
>> > > >
>> > > > (p) 301-683-1551
>> > > > rosalyn.maccracken at noaa.gov
>> > > >
>> > > >
>> > >
>> > >
>> >
>> >
>> > --
>> > Rosalyn MacCracken
>> > Support Scientist
>> >
>> > Ocean Applications Branch
>> > NOAA/NWS Ocean Prediction Center
>> > NCWCP
>> > 5830 University Research Ct
>> > College Park, MD  20740-3818
>> >
>> > (p) 301-683-1551
>> > rosalyn.maccracken at noaa.gov
>> >
>> >
>>
>>
>
>
> --
> Rosalyn MacCracken
> Support Scientist
>
> Ocean Applications Branch
> NOAA/NWS Ocean Prediction Center
> NCWCP
> 5830 University Research Ct
> College Park, MD  20740-3818
>
> (p) 301-683-1551
> rosalyn.maccracken at noaa.gov
>



--
Rosalyn MacCracken
Support Scientist

Ocean Applications Branch
NOAA/NWS Ocean Prediction Center
NCWCP
5830 University Research Ct
College Park, MD  20740-3818

(p) 301-683-1551
rosalyn.maccracken at noaa.gov

------------------------------------------------
Subject: question on regenerating data
From: John Halley Gotway
Time: Tue Apr 24 17:06:57 2018

Hi Roz,

Thanks for sending the sample data.  I grabbed it and used it run some
sample jobs:

time /d1/johnhg/MET/MET_releases/met-6.0/bin/stat_analysis \
-lookin
/d1/johnhg/MET/MET_Help/maccracken_data_20180424/opc_test/home/opc_test/data/met_verif/GFS/data/hourly
\
-config STATAnalysisConfig \
-log run_sa.log -v 3

I used the "-lookin" option to point to all the data you sent.

I've attached the...
(1) config file I used
(2) log file that was genrated
(3) output .stat files

Looking at the jobs, you'll see that I've included 5 of them...
- Generate CNT output
- Generate CTC >= 0.0 output
- Generate CTS >= 0.0 output
- Generate CTC >= 5.5689 output
- Generate CTS >= 5.5689 output

Unfortunately, you'll need to define separate jobs for each threshold
you'd
like to use.  Although, you shouldn't use >=0.0 since that's always
true.

Also unfortunately, this is pretty slow.  On my machine, it took like
18
minutes for these 5 jobs!

Thanks,
John


On Tue, Apr 24, 2018 at 2:09 PM, Rosalyn MacCracken - NOAA Affiliate
via RT
<met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>
> Hi John,
>
> I put my file on the ftp site.  Let me know what you find.  You'll
see
> those really low OBS values (0.01, 0.02, and so on).
>
> Thanks!
>
> Roz
>
> On Tue, Apr 24, 2018 at 2:53 PM, Rosalyn MacCracken - NOAA Affiliate
<
> rosalyn.maccracken at noaa.gov> wrote:
>
> > Ok, I'll get that over to the ftp site.  I have to make sure that
I find
> a
> > day that has all the data in it.  Sometimes the data isn't
available when
> > the script runs.  A little annoying, but, that's operations...
> >
> > I'll let you know when I get the file to the ftp site.
> >
> > Thanks!
> >
> > Roz
> >
> > On Tue, Apr 24, 2018 at 2:49 PM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> >
> >> Roz,
> >>
> >> Yes, we do.  Follow the instructions here:
> >>    https://dtcenter.org/met/users/support/met_help.php#ftp
> >>
> >> I'd suggest making a tar file for one day and posting them to the
ftp
> >> site:
> >>    tar -cvzf sample.tar.gz /GFS/data/hourly/20180305*
> >>
> >> Thanks,
> >> John
> >>
> >> On Tue, Apr 24, 2018 at 11:57 AM, Rosalyn MacCracken - NOAA
Affiliate
> via
> >> RT <met_help at ucar.edu> wrote:
> >>
> >> >
> >> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> >> >
> >> > HI John,
> >> >
> >> > Yes, it does seem that the -config option is the way to go to
recreate
> >> > those 3 files. I'll be sure to have a unique file name, or, mv
the
> >> output
> >> > file to a different name before running the command again.
Thanks for
> >> > pointing that out.
> >> >
> >> > I'm teleworking for the next couple of weeks, so, download and
send
> you
> >> > *.stat files like I can when I'm at my computer at work.  I
don't have
> >> > access to theia or wcoss anymore.  You have an ftp server that
I can
> >> upload
> >> > data to, right?  If not, I can try and fiddle around with this
> tomorrow
> >> and
> >> > see if I can't get this to work the way I want to.
> >> >
> >> > Roz
> >> >
> >> > On Tue, Apr 24, 2018 at 11:42 AM, John Halley Gotway via RT <
> >> > met_help at ucar.edu> wrote:
> >> >
> >> > > Roz,
> >> > >
> >> > > Each "-job aggregate_stat" only generates a single output
line type.
> >> So
> >> > > using "-out_line_type CTC,CTS,CNT" will not work.
> >> > >
> >> > > You'll need to run separate jobs for each output line type
you want
> to
> >> > > generate.  That's why I'd recommend grouping those multiple
jobs
> >> together
> >> > > into a single STAT-Analysis config file.  Then you'd call
> >> STAT-Analysis
> >> > > once using the "-config" command line option.
> >> > >
> >> > > Another issue is that if you set "-out_stat" to the same
filename,
> >> it'll
> >> > > get overridden by each job.  STAT-Analysis will overwrite
that
> output
> >> > file
> >> > > rather than appending to it.
> >> > >
> >> > > You could send me a day's worth of .stat output files
> >> > > (/GFS/data/hourly/20180305*) and I could send you some
suggestions.
> >> Or
> >> > if
> >> > > you have access to theia you could copy them up there and
point me
> to
> >> it.
> >> > >
> >> > > Thanks,
> >> > > John
> >> > >
> >> > > On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken - NOAA
Affiliate
> >> via
> >> > RT
> >> > > <met_help at ucar.edu> wrote:
> >> > >
> >> > > >
> >> > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> >> > > >
> >> > > > Hi John,
> >> > > >
> >> > > > Yes, that makes sense.  Those very small values (<1.0 m/s),
are
> bad
> >> > > > values.  That's why they shouldn't be included in the
processing.
> >> > > >
> >> > > > So, I need to just regenerate hourly data, one hour at a
time.
> >> Would
> >> > it
> >> > > > make sense to use a shell script and loop stat-analysis?
> Something
> >> > like:
> >> > > >
> >> > > > for day in 11 12
> >> > > > do
> >> > > >   for cycle in 00 06 12 18
> >> > > >   do
> >> > > > stat_analysis -lookin
/GFS/data/hourly/201803${day}${hour}/*.stat
> \
> >> > > > -job aggregate_stat \
> >> > > >    -line_type MPR \
> >> > > >    -out_line_type CTC,CTS,CNT \
> >> > > >   -fcst_var WIND \
> >> > > > -column_thresh OBS gt1 \
> >> > > >  -by
> >> > > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,I
> >> NTERP_PNTS
> >> > > > -out_stat /new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
> >> > > >   done
> >> > > > done
> >> > > >
> >> > > > or, something like that?  And, will this regenerate hour
> forecasts,
> >> at
> >> > > each
> >> > > > forecast and lead hour?  I guess it will see the forecast
and lead
> >> hour
> >> > > > from the *.stat file, and whatever *stat file is in the
directory,
> >> it
> >> > > will
> >> > > > regenerate those hours, right?
> >> > > >
> >> > > > So, I need to regenerate the CTC, CNT and CTS files.
That's why I
> >> did:
> >> > > >  -out_line_type CTC,CTS,CNT
> >> > > > but, will that make 3 separate files, or just another
*.stat file?
> >> > > >
> >> > > > Roz
> >> > > >
> >> > > >
> >> > > > On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway via RT
<
> >> > > > met_help at ucar.edu> wrote:
> >> > > >
> >> > > > > Roz,
> >> > > > >
> >> > > > > It is ultimately up to you to decide which matched pairs
you
> want
> >> to
> >> > > > > include in your processing.  Do you consider those small
(<1.0
> >> m/s)
> >> > > > > observation values to be corrupt and incorrect in some
way or
> just
> >> > not
> >> > > > very
> >> > > > > interesting?  If they really are BAD data values, I agree
that
> you
> >> > > should
> >> > > > > exclude them from your analysis.  But if they're just
> >> uninteresting
> >> > > > values
> >> > > > > of low wind speed, then there's no reason why you should
exclude
> >> > them.
> >> > > > For
> >> > > > > example, *most* of the time it ins't raining, but we
often
> >> included
> >> > > > > observations of 0 precip.
> >> > > > >
> >> > > > > There are three configurable options in Point-Stat that
may be
> >> useful
> >> > > > here:
> >> > > > > (1) You already know and use the "cat_thresh" option.
This
> >> threshold
> >> > > > > defines the events and non-events for a 2x2 contingency
table.
> >> This
> >> > > > > threshold affects the contents of FHO, CTC, CTS, MCTC,
and MCTS
> >> line
> >> > > > types
> >> > > > > that Point-Stat writes.
> >> > > > > (2) The "cnt_thresh" option is a more recent addition.
Perhaps
> >> this
> >> > > was
> >> > > > a
> >> > > > > poor name choice, but instead of defining categories,
it's
> really
> >> a
> >> > > > > *filtering* threshold.  This threshold affects the
contents of
> the
> >> > > SL1L2,
> >> > > > > SAL1L2, and CNT line types that Point-Stat writes.  For
example,
> >> > > setting
> >> > > > > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and 2
SL1L2
> >> output
> >> > > lines
> >> > > > > containing only those points where the wind speed was >=6
and
> >> >=17,
> >> > > > > respectively.
> >> > > > > (3) The "wind_thresh" option is very similar to the
"cnt_thresh"
> >> > option
> >> > > > but
> >> > > > > affects the contents of teh VL1L2, VAL1L2, and VCNT (new
in
> >> met-7.0)
> >> > > line
> >> > > > > types.  Only those U/V pairs that meet the specified wind
speed
> >> > > threshold
> >> > > > > are included in the output.
> >> > > > >
> >> > > > > For both "cnt_thresh" and "wind_thresh", the default
value in
> the
> >> > > config
> >> > > > > file is "NA", meaning, do not apply any filtering
threshold
> >> criteria.
> >> > > > >
> >> > > > > You have the flexibility to run STAT-Analysis on the MPR
output
> >> lines
> >> > > to
> >> > > > > recompute any of these output line types applying
whatever
> >> filtering
> >> > > > > criteria you'd like.
> >> > > > > Here's the MET user's guide:
> >> > > > > https://dtcenter.org/met/users/docs/users_guide/MET_
> >> > > Users_Guide_v7.0.pdf
> >> > > > > Look on page 98 for the job command options for the
> >> "aggregate_stat"
> >> > > line
> >> > > > > type when the input line type is "MPR".
> >> > > > >
> >> > > > > For your second question, the "-lookin PATH" option is
*VERY*
> >> > flexible.
> >> > > > > You can set PATH to either a single value or multiple
values.
> If
> >> you
> >> > > use
> >> > > > > wildcards, then the shell expands those wildcards to
multiple
> >> values.
> >> > > > Each
> >> > > > > value you pass in can either be a filename or a directory
name.
> >> If
> >> > you
> >> > > > > pass in a filename, STAT-Analysis will read it
*REGARDLESS* of
> the
> >> > file
> >> > > > > extension.  If you pass in a directory name, STAT-
Analysis will
> >> > search
> >> > > > that
> >> > > > > directory *RECURSIVELY* for files ending in ".stat".  For
> example,
> >> > > either
> >> > > > > of the following settings would tell STAT-Analysis to
read the
> >> same
> >> > > list
> >> > > > of
> >> > > > > files:
> >> > > > >    -lookin /GFS/data/hourly/*/*.stat
> >> > > > >    ... or ...
> >> > > > >    -lookin /GFS/data/hourly
> >> > > > >
> >> > > > > Be aware though that the more data you pass to STAT-
Analysis,
> the
> >> > > longer
> >> > > > > it'll take for it to process it.  You can decide how much
data
> you
> >> > pass
> >> > > > it
> >> > > > > for each job.  I'd suggest starting with what is most
convenient
> >> for
> >> > > you.
> >> > > > > If it's too slow, change the logic to pass it less data
(e.g.
> >> only 1
> >> > > day
> >> > > > of
> >> > > > > data rather than 1 month of data).
> >> > > > >
> >> > > > > Yes, you can give it a date range.  Use -fcst_init_beg
and
> >> > > -fcst_init_end
> >> > > > > to specify beginning/ending model initialization times or
> >> > > -fcst_valid_beg
> >> > > > > and -fcst_valid_end to specify beginning/ending valid
times.
> >> > > > >
> >> > > > > If you find that you're running multiple jobs on the same
subset
> >> of
> >> > > data
> >> > > > > (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC, MPR
to CTS),
> >> it'd
> >> > > be
> >> > > > > more efficient to group those jobs into a config file.
That'll
> do
> >> > the
> >> > > > > filtering ONCE and write the filtered data to a temp
file.  Then
> >> all
> >> > > the
> >> > > > > jobs read data from the temp instead of starting over
from
> >> scratch.
> >> > > > >
> >> > > > > Make sense?
> >> > > > >
> >> > > > > John
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken -
NOAA
> >> Affiliate
> >> > > via
> >> > > > RT
> >> > > > > <met_help at ucar.edu> wrote:
> >> > > > >
> >> > > > > >
> >> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
> >
> >> > > > > >
> >> > > > > > Hi John,
> >> > > > > >
> >> > > > > > That's actually only partially correct.  It's not that
I want
> to
> >> > use
> >> > > > part
> >> > > > > > of the MPR lines and discard the rest, and I do need to
> >> regenerate
> >> > > > > > statistics.  Let me try to re-explain.
> >> > > > > >
> >> > > > > > Back in early March we switched from getting our ASCAT
obs
> from
> >> the
> >> > > > > > prepbufr data, to getting it from the MGDRLITE data.
So,
> >> processing
> >> > > > > didn't
> >> > > > > > change.  I was producing statistics at certain
threshold
> levels
> >> for
> >> > > > both
> >> > > > > > GFS and ASCAT.  I had this set with the cat_thresh
list, at
> >> levels
> >> > of
> >> > > > > > 0,6,17, etc.  We found out after processing for a
couple of
> >> weeks
> >> > > that
> >> > > > > the
> >> > > > > > ASCAT data included these really small values, <1.0
m/s, and
> >> that
> >> > > these
> >> > > > > > small wind speeds were being included into the
statistics
> >> > processing.
> >> > > > > >
> >> > > > > > So, a couple of questions.
> >> > > > > > 1) Do I have to regenerate all of my statistics (*.cts,
*.cnt
> >> and
> >> > > *ctc
> >> > > > > > files) because of this error? Or, since I have
threshold
> levels
> >> > set,
> >> > > > will
> >> > > > > > those small values be amoung the statistics in the
lowest
> >> > thresholds?
> >> > > > > > 2) I have the *.stat files, but, they are spread out
into
> >> separate
> >> > > > > > directories like:
> >> > > > > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> >> > > > > > Can I tell stat-analysis to "lookin" directories with a
> wildcard
> >> > > (like
> >> > > > > > 201803*)?  If so, how?  Or, is I tell it to look in
> >> > /GFS/data/hourly,
> >> > > > > will
> >> > > > > > it look in all the directories recursively under
hourly?  And,
> >> it
> >> > > > that's
> >> > > > > > the case, can I give it a date range, so, that it only
> processes
> >> > data
> >> > > > > from
> >> > > > > > March?
> >> > > > > >
> >> > > > > > Roz
> >> > > > > >
> >> > > > > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway via
RT <
> >> > > > > > met_help at ucar.edu> wrote:
> >> > > > > >
> >> > > > > > > Hi Roz,
> >> > > > > > >
> >> > > > > > > I read that you've run Point-Stat and saved off the
matched
> >> pairs
> >> > > > (MPR)
> >> > > > > > > output line type.  And you'd like to (1) filter those
MPR
> >> lines
> >> > to
> >> > > > > > discard
> >> > > > > > > some of them and then (2) use the filtered data to
> regenerate
> >> > > summary
> >> > > > > > > statistics.  Yes, this is easily done using the
> STAT-Analysis
> >> > tool
> >> > > in
> >> > > > > > MET.
> >> > > > > > >
> >> > > > > > > You wrote that you're verifying wind speeds against
ASCAT
> and
> >> > that
> >> > > > > you'd
> >> > > > > > > like to exclude pairs where the observed wind speed
is less
> >> than
> >> > 1
> >> > > > m/s.
> >> > > > > > > I'm just guessing here, but I'll presume that you
want to
> >> produce
> >> > > > both
> >> > > > > > > SL1L2 and CNT output line types.  Here's what the
> >> STAT-Analysis
> >> > job
> >> > > > > would
> >> > > > > > > look like:
> >> > > > > > >
> >> > > > > > > # Filter MPR's and write SL1L2 output line
> >> > > > > > > stat_analysis \
> >> > > > > > >    -lookin input.stat \            # List a .stat
filename
> or
> >> > > > directory
> >> > > > > > > containing them
> >> > > > > > >    -job aggregate_stat \        # Job type is
aggregate_stat
> >> > > > > > >    -line_type MPR \              # Input line type =
MPR
> >> > > > > > >    -out_line_type SL1L2 \      # Output line type =
SL1L2
> >> partial
> >> > > > sums
> >> > > > > > >    -fcst_var WIND \               # Only process
lines where
> >> > > FCST_VAR
> >> > > > > > > column = WIND
> >> > > > > > >    -column_thresh OBS gt1 \ # Only use MPR lines
where OBS
> >> column
> >> > > > 1
> >> > > > > > >    -by
> >> > > > > > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,
> >> > > > INTERP_PNTS
> >> > > > > #
> >> > > > > > > Run this same job for each unique combination of
these
> columns
> >> > > > > > >    -out_stat MPR_to_SL1L2.stat
> >> > > > > > >
> >> > > > > > > This will read produce an output .stat file
containing an
> >> SL1L2
> >> > > line
> >> > > > > for
> >> > > > > > > each unique combination of the header columns listed
after
> the
> >> > > "-by"
> >> > > > > > > option.  To generate CNT output lines instead, you'd
run a
> >> second
> >> > > job
> >> > > > > > where
> >> > > > > > > you replace SL1L2 with CNT.  You could run these jobs
on the
> >> > > command
> >> > > > > line
> >> > > > > > > or group them together into a STAT-Analysis config
file, if
> >> you
> >> > > > prefer.
> >> > > > > > > Both would work.
> >> > > > > > >
> >> > > > > > > You could run this once for each input .stat file
you're
> >> > > > processing...
> >> > > > > or
> >> > > > > > > you could pass many input .stat files to the job.
Since
> >> > > > FCST_INIT_BEG
> >> > > > > > and
> >> > > > > > > FCST_LEAD are included in the "-by" option, you'll
get
> >> separate
> >> > > > output
> >> > > > > > > lines for each unique time.
> >> > > > > > >
> >> > > > > > > Hope that helps get you going.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > John
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik via
RT <
> >> > > > > > > met_help at ucar.edu>
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > > > <URL: https://rt.rap.ucar.edu/rt/Tic
> >> ket/Display.html?id=84822
> >> > >
> >> > > > > > > >
> >> > > > > > > > Hi Roz.  My apologies for the delay in responding.
> >> > > > > > > >
> >> > > > > > > > Unfortunately, John is out of the office this week,
and I
> do
> >> > not
> >> > > > know
> >> > > > > > the
> >> > > > > > > > answers to your questions.  As you said, I would
also
> >> imagine
> >> > > that
> >> > > > > > > > point-stat is using those small values as matched
pairs.
> >> > Also, I
> >> > > > do
> >> > > > > > not
> >> > > > > > > > believe there is a way to regenerate the point-stat
> >> statistics
> >> > > > > without
> >> > > > > > > > using the original GFS data.  I cannot say with
certainty,
> >> > > however.
> >> > > > > > > Thank
> >> > > > > > > > you for your patience in advance.  We'll get a
definite
> >> > response
> >> > > to
> >> > > > > you
> >> > > > > > > as
> >> > > > > > > > soon as we can.
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > Julie
> >> > > > > > > >
> >> > > > > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn MacCracken
- NOAA
> >> > > > Affiliate
> >> > > > > > via
> >> > > > > > > RT
> >> > > > > > > > <met_help at ucar.edu> wrote:
> >> > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Wed Apr 18 06:31:39 2018: Request 84822 was acted
upon.
> >> > > > > > > > > Transaction: Ticket created by
> >> rosalyn.maccracken at noaa.gov
> >> > > > > > > > >        Queue: met_help
> >> > > > > > > > >      Subject: question on regenerating data
> >> > > > > > > > >        Owner: Nobody
> >> > > > > > > > >   Requestors: rosalyn.maccracken at noaa.gov
> >> > > > > > > > >       Status: new
> >> > > > > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> >> > > > > > Ticket/Display.html?id=84822
> >> > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Hi,
> >> > > > > > > > >
> >> > > > > > > > > I'm running point-stat using ASCAT and GFS data
to
> verify
> >> > > surface
> >> > > > > > wind
> >> > > > > > > > > speeds.  I found an error in my ASCAT input data
that
> goes
> >> > back
> >> > > > to
> >> > > > > > Mar
> >> > > > > > > 7.
> >> > > > > > > > > I had switched the input source of the data, and
within
> >> the
> >> > new
> >> > > > > data
> >> > > > > > > > files,
> >> > > > > > > > > it was allowing very small values (< 1 m/s) to be
used
> as
> >> > data
> >> > > > > points
> >> > > > > > > in
> >> > > > > > > > > the verification.  I imagine that this is an
issue,
> since
> >> > > > > point-stat
> >> > > > > > is
> >> > > > > > > > > using these very small values as matched pairs
with the
> >> GFS,
> >> > > > > correct?
> >> > > > > > > > >
> >> > > > > > > > > Is there a way to regenerate the point-stat
statistics
> >> > without
> >> > > > > using
> >> > > > > > > the
> >> > > > > > > > > original GFS data?  I do have the *stat and the
*mpr
> >> files,
> >> > and
> >> > > > it
> >> > > > > is
> >> > > > > > > > > pretty easy to identify where the bad values are
> located.
> >> > > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > > Roz
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > > Rosalyn MacCracken
> >> > > > > > > > > Support Scientist
> >> > > > > > > > >
> >> > > > > > > > > Ocean Applications Branch
> >> > > > > > > > > NOAA/NWS Ocean Prediction Center
> >> > > > > > > > > NCWCP
> >> > > > > > > > > 5830 University Research Ct
> >> > > > > > > > > College Park, MD  20740-3818
> >> > > > > > > > >
> >> > > > > > > > > (p) 301-683-1551
> >> > > > > > > > > rosalyn.maccracken at noaa.gov
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Rosalyn MacCracken
> >> > > > > > Support Scientist
> >> > > > > >
> >> > > > > > Ocean Applications Branch
> >> > > > > > NOAA/NWS Ocean Prediction Center
> >> > > > > > NCWCP
> >> > > > > > 5830 University Research Ct
> >> > > > > > College Park, MD  20740-3818
> >> > > > > >
> >> > > > > > (p) 301-683-1551
> >> > > > > > rosalyn.maccracken at noaa.gov
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Rosalyn MacCracken
> >> > > > Support Scientist
> >> > > >
> >> > > > Ocean Applications Branch
> >> > > > NOAA/NWS Ocean Prediction Center
> >> > > > NCWCP
> >> > > > 5830 University Research Ct
> >> > > > College Park, MD  20740-3818
> >> > > >
> >> > > > (p) 301-683-1551
> >> > > > rosalyn.maccracken at noaa.gov
> >> > > >
> >> > > >
> >> > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Rosalyn MacCracken
> >> > Support Scientist
> >> >
> >> > Ocean Applications Branch
> >> > NOAA/NWS Ocean Prediction Center
> >> > NCWCP
> >> > 5830 University Research Ct
> >> > College Park, MD  20740-3818
> >> >
> >> > (p) 301-683-1551
> >> > rosalyn.maccracken at noaa.gov
> >> >
> >> >
> >>
> >>
> >
> >
> > --
> > Rosalyn MacCracken
> > Support Scientist
> >
> > Ocean Applications Branch
> > NOAA/NWS Ocean Prediction Center
> > NCWCP
> > 5830 University Research Ct
> > College Park, MD  20740-3818
> >
> > (p) 301-683-1551
> > rosalyn.maccracken at noaa.gov
> >
>
>
>
> --
> Rosalyn MacCracken
> Support Scientist
>
> Ocean Applications Branch
> NOAA/NWS Ocean Prediction Center
> NCWCP
> 5830 University Research Ct
> College Park, MD  20740-3818
>
> (p) 301-683-1551
> rosalyn.maccracken at noaa.gov
>
>

------------------------------------------------
Subject: question on regenerating data
From: Rosalyn MacCracken - NOAA Affiliate
Time: Wed Apr 25 09:18:42 2018

Hi John,

Thanks for doing that for me.  I'll take a look at the info you sent
me
this afternoon.  I'm in the middle of doing something right
now...trying to
make a different program work.  ;-/

I wonder if it will be quicker than 18 minutes for some of the
thresholds
that have higher wind speeds, and not as many instances (or 0
instances).
Or, will it take just as long, since it still needs to read through
the
entire *.stat file anyway?

Roz

On Tue, Apr 24, 2018 at 7:06 PM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:

> Hi Roz,
>
> Thanks for sending the sample data.  I grabbed it and used it run
some
> sample jobs:
>
> time /d1/johnhg/MET/MET_releases/met-6.0/bin/stat_analysis \
> -lookin
> /d1/johnhg/MET/MET_Help/maccracken_data_20180424/opc_
> test/home/opc_test/data/met_verif/GFS/data/hourly
> \
> -config STATAnalysisConfig \
> -log run_sa.log -v 3
>
> I used the "-lookin" option to point to all the data you sent.
>
> I've attached the...
> (1) config file I used
> (2) log file that was genrated
> (3) output .stat files
>
> Looking at the jobs, you'll see that I've included 5 of them...
> - Generate CNT output
> - Generate CTC >= 0.0 output
> - Generate CTS >= 0.0 output
> - Generate CTC >= 5.5689 output
> - Generate CTS >= 5.5689 output
>
> Unfortunately, you'll need to define separate jobs for each
threshold you'd
> like to use.  Although, you shouldn't use >=0.0 since that's always
true.
>
> Also unfortunately, this is pretty slow.  On my machine, it took
like 18
> minutes for these 5 jobs!
>
> Thanks,
> John
>
>
> On Tue, Apr 24, 2018 at 2:09 PM, Rosalyn MacCracken - NOAA Affiliate
via RT
> <met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> >
> > Hi John,
> >
> > I put my file on the ftp site.  Let me know what you find.  You'll
see
> > those really low OBS values (0.01, 0.02, and so on).
> >
> > Thanks!
> >
> > Roz
> >
> > On Tue, Apr 24, 2018 at 2:53 PM, Rosalyn MacCracken - NOAA
Affiliate <
> > rosalyn.maccracken at noaa.gov> wrote:
> >
> > > Ok, I'll get that over to the ftp site.  I have to make sure
that I
> find
> > a
> > > day that has all the data in it.  Sometimes the data isn't
available
> when
> > > the script runs.  A little annoying, but, that's operations...
> > >
> > > I'll let you know when I get the file to the ftp site.
> > >
> > > Thanks!
> > >
> > > Roz
> > >
> > > On Tue, Apr 24, 2018 at 2:49 PM, John Halley Gotway via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > >> Roz,
> > >>
> > >> Yes, we do.  Follow the instructions here:
> > >>    https://dtcenter.org/met/users/support/met_help.php#ftp
> > >>
> > >> I'd suggest making a tar file for one day and posting them to
the ftp
> > >> site:
> > >>    tar -cvzf sample.tar.gz /GFS/data/hourly/20180305*
> > >>
> > >> Thanks,
> > >> John
> > >>
> > >> On Tue, Apr 24, 2018 at 11:57 AM, Rosalyn MacCracken - NOAA
Affiliate
> > via
> > >> RT <met_help at ucar.edu> wrote:
> > >>
> > >> >
> > >> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
>
> > >> >
> > >> > HI John,
> > >> >
> > >> > Yes, it does seem that the -config option is the way to go to
> recreate
> > >> > those 3 files. I'll be sure to have a unique file name, or,
mv the
> > >> output
> > >> > file to a different name before running the command again.
Thanks
> for
> > >> > pointing that out.
> > >> >
> > >> > I'm teleworking for the next couple of weeks, so, download
and send
> > you
> > >> > *.stat files like I can when I'm at my computer at work.  I
don't
> have
> > >> > access to theia or wcoss anymore.  You have an ftp server
that I can
> > >> upload
> > >> > data to, right?  If not, I can try and fiddle around with
this
> > tomorrow
> > >> and
> > >> > see if I can't get this to work the way I want to.
> > >> >
> > >> > Roz
> > >> >
> > >> > On Tue, Apr 24, 2018 at 11:42 AM, John Halley Gotway via RT <
> > >> > met_help at ucar.edu> wrote:
> > >> >
> > >> > > Roz,
> > >> > >
> > >> > > Each "-job aggregate_stat" only generates a single output
line
> type.
> > >> So
> > >> > > using "-out_line_type CTC,CTS,CNT" will not work.
> > >> > >
> > >> > > You'll need to run separate jobs for each output line type
you
> want
> > to
> > >> > > generate.  That's why I'd recommend grouping those multiple
jobs
> > >> together
> > >> > > into a single STAT-Analysis config file.  Then you'd call
> > >> STAT-Analysis
> > >> > > once using the "-config" command line option.
> > >> > >
> > >> > > Another issue is that if you set "-out_stat" to the same
filename,
> > >> it'll
> > >> > > get overridden by each job.  STAT-Analysis will overwrite
that
> > output
> > >> > file
> > >> > > rather than appending to it.
> > >> > >
> > >> > > You could send me a day's worth of .stat output files
> > >> > > (/GFS/data/hourly/20180305*) and I could send you some
> suggestions.
> > >> Or
> > >> > if
> > >> > > you have access to theia you could copy them up there and
point me
> > to
> > >> it.
> > >> > >
> > >> > > Thanks,
> > >> > > John
> > >> > >
> > >> > > On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken - NOAA
> Affiliate
> > >> via
> > >> > RT
> > >> > > <met_help at ucar.edu> wrote:
> > >> > >
> > >> > > >
> > >> > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > >> > > >
> > >> > > > Hi John,
> > >> > > >
> > >> > > > Yes, that makes sense.  Those very small values (<1.0
m/s), are
> > bad
> > >> > > > values.  That's why they shouldn't be included in the
> processing.
> > >> > > >
> > >> > > > So, I need to just regenerate hourly data, one hour at a
time.
> > >> Would
> > >> > it
> > >> > > > make sense to use a shell script and loop stat-analysis?
> > Something
> > >> > like:
> > >> > > >
> > >> > > > for day in 11 12
> > >> > > > do
> > >> > > >   for cycle in 00 06 12 18
> > >> > > >   do
> > >> > > > stat_analysis -lookin /GFS/data/hourly/201803${day}$
> {hour}/*.stat
> > \
> > >> > > > -job aggregate_stat \
> > >> > > >    -line_type MPR \
> > >> > > >    -out_line_type CTC,CTS,CNT \
> > >> > > >   -fcst_var WIND \
> > >> > > > -column_thresh OBS gt1 \
> > >> > > >  -by
> > >> > > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,I
> > >> NTERP_PNTS
> > >> > > > -out_stat /new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
> > >> > > >   done
> > >> > > > done
> > >> > > >
> > >> > > > or, something like that?  And, will this regenerate hour
> > forecasts,
> > >> at
> > >> > > each
> > >> > > > forecast and lead hour?  I guess it will see the forecast
and
> lead
> > >> hour
> > >> > > > from the *.stat file, and whatever *stat file is in the
> directory,
> > >> it
> > >> > > will
> > >> > > > regenerate those hours, right?
> > >> > > >
> > >> > > > So, I need to regenerate the CTC, CNT and CTS files.
That's
> why I
> > >> did:
> > >> > > >  -out_line_type CTC,CTS,CNT
> > >> > > > but, will that make 3 separate files, or just another
*.stat
> file?
> > >> > > >
> > >> > > > Roz
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway via
RT <
> > >> > > > met_help at ucar.edu> wrote:
> > >> > > >
> > >> > > > > Roz,
> > >> > > > >
> > >> > > > > It is ultimately up to you to decide which matched
pairs you
> > want
> > >> to
> > >> > > > > include in your processing.  Do you consider those
small (<1.0
> > >> m/s)
> > >> > > > > observation values to be corrupt and incorrect in some
way or
> > just
> > >> > not
> > >> > > > very
> > >> > > > > interesting?  If they really are BAD data values, I
agree that
> > you
> > >> > > should
> > >> > > > > exclude them from your analysis.  But if they're just
> > >> uninteresting
> > >> > > > values
> > >> > > > > of low wind speed, then there's no reason why you
should
> exclude
> > >> > them.
> > >> > > > For
> > >> > > > > example, *most* of the time it ins't raining, but we
often
> > >> included
> > >> > > > > observations of 0 precip.
> > >> > > > >
> > >> > > > > There are three configurable options in Point-Stat that
may be
> > >> useful
> > >> > > > here:
> > >> > > > > (1) You already know and use the "cat_thresh" option.
This
> > >> threshold
> > >> > > > > defines the events and non-events for a 2x2 contingency
table.
> > >> This
> > >> > > > > threshold affects the contents of FHO, CTC, CTS, MCTC,
and
> MCTS
> > >> line
> > >> > > > types
> > >> > > > > that Point-Stat writes.
> > >> > > > > (2) The "cnt_thresh" option is a more recent addition.
> Perhaps
> > >> this
> > >> > > was
> > >> > > > a
> > >> > > > > poor name choice, but instead of defining categories,
it's
> > really
> > >> a
> > >> > > > > *filtering* threshold.  This threshold affects the
contents of
> > the
> > >> > > SL1L2,
> > >> > > > > SAL1L2, and CNT line types that Point-Stat writes.  For
> example,
> > >> > > setting
> > >> > > > > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and 2
SL1L2
> > >> output
> > >> > > lines
> > >> > > > > containing only those points where the wind speed was
>=6 and
> > >> >=17,
> > >> > > > > respectively.
> > >> > > > > (3) The "wind_thresh" option is very similar to the
> "cnt_thresh"
> > >> > option
> > >> > > > but
> > >> > > > > affects the contents of teh VL1L2, VAL1L2, and VCNT
(new in
> > >> met-7.0)
> > >> > > line
> > >> > > > > types.  Only those U/V pairs that meet the specified
wind
> speed
> > >> > > threshold
> > >> > > > > are included in the output.
> > >> > > > >
> > >> > > > > For both "cnt_thresh" and "wind_thresh", the default
value in
> > the
> > >> > > config
> > >> > > > > file is "NA", meaning, do not apply any filtering
threshold
> > >> criteria.
> > >> > > > >
> > >> > > > > You have the flexibility to run STAT-Analysis on the
MPR
> output
> > >> lines
> > >> > > to
> > >> > > > > recompute any of these output line types applying
whatever
> > >> filtering
> > >> > > > > criteria you'd like.
> > >> > > > > Here's the MET user's guide:
> > >> > > > > https://dtcenter.org/met/users/docs/users_guide/MET_
> > >> > > Users_Guide_v7.0.pdf
> > >> > > > > Look on page 98 for the job command options for the
> > >> "aggregate_stat"
> > >> > > line
> > >> > > > > type when the input line type is "MPR".
> > >> > > > >
> > >> > > > > For your second question, the "-lookin PATH" option is
*VERY*
> > >> > flexible.
> > >> > > > > You can set PATH to either a single value or multiple
values.
> > If
> > >> you
> > >> > > use
> > >> > > > > wildcards, then the shell expands those wildcards to
multiple
> > >> values.
> > >> > > > Each
> > >> > > > > value you pass in can either be a filename or a
directory
> name.
> > >> If
> > >> > you
> > >> > > > > pass in a filename, STAT-Analysis will read it
*REGARDLESS* of
> > the
> > >> > file
> > >> > > > > extension.  If you pass in a directory name, STAT-
Analysis
> will
> > >> > search
> > >> > > > that
> > >> > > > > directory *RECURSIVELY* for files ending in ".stat".
For
> > example,
> > >> > > either
> > >> > > > > of the following settings would tell STAT-Analysis to
read the
> > >> same
> > >> > > list
> > >> > > > of
> > >> > > > > files:
> > >> > > > >    -lookin /GFS/data/hourly/*/*.stat
> > >> > > > >    ... or ...
> > >> > > > >    -lookin /GFS/data/hourly
> > >> > > > >
> > >> > > > > Be aware though that the more data you pass to STAT-
Analysis,
> > the
> > >> > > longer
> > >> > > > > it'll take for it to process it.  You can decide how
much data
> > you
> > >> > pass
> > >> > > > it
> > >> > > > > for each job.  I'd suggest starting with what is most
> convenient
> > >> for
> > >> > > you.
> > >> > > > > If it's too slow, change the logic to pass it less data
(e.g.
> > >> only 1
> > >> > > day
> > >> > > > of
> > >> > > > > data rather than 1 month of data).
> > >> > > > >
> > >> > > > > Yes, you can give it a date range.  Use -fcst_init_beg
and
> > >> > > -fcst_init_end
> > >> > > > > to specify beginning/ending model initialization times
or
> > >> > > -fcst_valid_beg
> > >> > > > > and -fcst_valid_end to specify beginning/ending valid
times.
> > >> > > > >
> > >> > > > > If you find that you're running multiple jobs on the
same
> subset
> > >> of
> > >> > > data
> > >> > > > > (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC, MPR
to
> CTS),
> > >> it'd
> > >> > > be
> > >> > > > > more efficient to group those jobs into a config file.
> That'll
> > do
> > >> > the
> > >> > > > > filtering ONCE and write the filtered data to a temp
file.
> Then
> > >> all
> > >> > > the
> > >> > > > > jobs read data from the temp instead of starting over
from
> > >> scratch.
> > >> > > > >
> > >> > > > > Make sense?
> > >> > > > >
> > >> > > > > John
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken -
NOAA
> > >> Affiliate
> > >> > > via
> > >> > > > RT
> > >> > > > > <met_help at ucar.edu> wrote:
> > >> > > > >
> > >> > > > > >
> > >> > > > > > <URL: https://rt.rap.ucar.edu/rt/
> Ticket/Display.html?id=84822
> > >
> > >> > > > > >
> > >> > > > > > Hi John,
> > >> > > > > >
> > >> > > > > > That's actually only partially correct.  It's not
that I
> want
> > to
> > >> > use
> > >> > > > part
> > >> > > > > > of the MPR lines and discard the rest, and I do need
to
> > >> regenerate
> > >> > > > > > statistics.  Let me try to re-explain.
> > >> > > > > >
> > >> > > > > > Back in early March we switched from getting our
ASCAT obs
> > from
> > >> the
> > >> > > > > > prepbufr data, to getting it from the MGDRLITE data.
So,
> > >> processing
> > >> > > > > didn't
> > >> > > > > > change.  I was producing statistics at certain
threshold
> > levels
> > >> for
> > >> > > > both
> > >> > > > > > GFS and ASCAT.  I had this set with the cat_thresh
list, at
> > >> levels
> > >> > of
> > >> > > > > > 0,6,17, etc.  We found out after processing for a
couple of
> > >> weeks
> > >> > > that
> > >> > > > > the
> > >> > > > > > ASCAT data included these really small values, <1.0
m/s, and
> > >> that
> > >> > > these
> > >> > > > > > small wind speeds were being included into the
statistics
> > >> > processing.
> > >> > > > > >
> > >> > > > > > So, a couple of questions.
> > >> > > > > > 1) Do I have to regenerate all of my statistics
(*.cts,
> *.cnt
> > >> and
> > >> > > *ctc
> > >> > > > > > files) because of this error? Or, since I have
threshold
> > levels
> > >> > set,
> > >> > > > will
> > >> > > > > > those small values be amoung the statistics in the
lowest
> > >> > thresholds?
> > >> > > > > > 2) I have the *.stat files, but, they are spread out
into
> > >> separate
> > >> > > > > > directories like:
> > >> > > > > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> > >> > > > > > Can I tell stat-analysis to "lookin" directories with
a
> > wildcard
> > >> > > (like
> > >> > > > > > 201803*)?  If so, how?  Or, is I tell it to look in
> > >> > /GFS/data/hourly,
> > >> > > > > will
> > >> > > > > > it look in all the directories recursively under
hourly?
> And,
> > >> it
> > >> > > > that's
> > >> > > > > > the case, can I give it a date range, so, that it
only
> > processes
> > >> > data
> > >> > > > > from
> > >> > > > > > March?
> > >> > > > > >
> > >> > > > > > Roz
> > >> > > > > >
> > >> > > > > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway
via RT <
> > >> > > > > > met_help at ucar.edu> wrote:
> > >> > > > > >
> > >> > > > > > > Hi Roz,
> > >> > > > > > >
> > >> > > > > > > I read that you've run Point-Stat and saved off the
> matched
> > >> pairs
> > >> > > > (MPR)
> > >> > > > > > > output line type.  And you'd like to (1) filter
those MPR
> > >> lines
> > >> > to
> > >> > > > > > discard
> > >> > > > > > > some of them and then (2) use the filtered data to
> > regenerate
> > >> > > summary
> > >> > > > > > > statistics.  Yes, this is easily done using the
> > STAT-Analysis
> > >> > tool
> > >> > > in
> > >> > > > > > MET.
> > >> > > > > > >
> > >> > > > > > > You wrote that you're verifying wind speeds against
ASCAT
> > and
> > >> > that
> > >> > > > > you'd
> > >> > > > > > > like to exclude pairs where the observed wind speed
is
> less
> > >> than
> > >> > 1
> > >> > > > m/s.
> > >> > > > > > > I'm just guessing here, but I'll presume that you
want to
> > >> produce
> > >> > > > both
> > >> > > > > > > SL1L2 and CNT output line types.  Here's what the
> > >> STAT-Analysis
> > >> > job
> > >> > > > > would
> > >> > > > > > > look like:
> > >> > > > > > >
> > >> > > > > > > # Filter MPR's and write SL1L2 output line
> > >> > > > > > > stat_analysis \
> > >> > > > > > >    -lookin input.stat \            # List a .stat
filename
> > or
> > >> > > > directory
> > >> > > > > > > containing them
> > >> > > > > > >    -job aggregate_stat \        # Job type is
> aggregate_stat
> > >> > > > > > >    -line_type MPR \              # Input line type
= MPR
> > >> > > > > > >    -out_line_type SL1L2 \      # Output line type =
SL1L2
> > >> partial
> > >> > > > sums
> > >> > > > > > >    -fcst_var WIND \               # Only process
lines
> where
> > >> > > FCST_VAR
> > >> > > > > > > column = WIND
> > >> > > > > > >    -column_thresh OBS gt1 \ # Only use MPR lines
where OBS
> > >> column
> > >> > > > 1
> > >> > > > > > >    -by
> > >> > > > > > > MODEL,FCST_LEV,FCST_INIT_BEG,
> FCST_LEAD,VX_MASK,INTERP_MTHD,
> > >> > > > INTERP_PNTS
> > >> > > > > #
> > >> > > > > > > Run this same job for each unique combination of
these
> > columns
> > >> > > > > > >    -out_stat MPR_to_SL1L2.stat
> > >> > > > > > >
> > >> > > > > > > This will read produce an output .stat file
containing an
> > >> SL1L2
> > >> > > line
> > >> > > > > for
> > >> > > > > > > each unique combination of the header columns
listed after
> > the
> > >> > > "-by"
> > >> > > > > > > option.  To generate CNT output lines instead,
you'd run a
> > >> second
> > >> > > job
> > >> > > > > > where
> > >> > > > > > > you replace SL1L2 with CNT.  You could run these
jobs on
> the
> > >> > > command
> > >> > > > > line
> > >> > > > > > > or group them together into a STAT-Analysis config
file,
> if
> > >> you
> > >> > > > prefer.
> > >> > > > > > > Both would work.
> > >> > > > > > >
> > >> > > > > > > You could run this once for each input .stat file
you're
> > >> > > > processing...
> > >> > > > > or
> > >> > > > > > > you could pass many input .stat files to the job.
Since
> > >> > > > FCST_INIT_BEG
> > >> > > > > > and
> > >> > > > > > > FCST_LEAD are included in the "-by" option, you'll
get
> > >> separate
> > >> > > > output
> > >> > > > > > > lines for each unique time.
> > >> > > > > > >
> > >> > > > > > > Hope that helps get you going.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > John
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik
via RT <
> > >> > > > > > > met_help at ucar.edu>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > <URL: https://rt.rap.ucar.edu/rt/Tic
> > >> ket/Display.html?id=84822
> > >> > >
> > >> > > > > > > >
> > >> > > > > > > > Hi Roz.  My apologies for the delay in
responding.
> > >> > > > > > > >
> > >> > > > > > > > Unfortunately, John is out of the office this
week, and
> I
> > do
> > >> > not
> > >> > > > know
> > >> > > > > > the
> > >> > > > > > > > answers to your questions.  As you said, I would
also
> > >> imagine
> > >> > > that
> > >> > > > > > > > point-stat is using those small values as matched
pairs.
> > >> > Also, I
> > >> > > > do
> > >> > > > > > not
> > >> > > > > > > > believe there is a way to regenerate the point-
stat
> > >> statistics
> > >> > > > > without
> > >> > > > > > > > using the original GFS data.  I cannot say with
> certainty,
> > >> > > however.
> > >> > > > > > > Thank
> > >> > > > > > > > you for your patience in advance.  We'll get a
definite
> > >> > response
> > >> > > to
> > >> > > > > you
> > >> > > > > > > as
> > >> > > > > > > > soon as we can.
> > >> > > > > > > >
> > >> > > > > > > > Thanks,
> > >> > > > > > > > Julie
> > >> > > > > > > >
> > >> > > > > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn
MacCracken -
> NOAA
> > >> > > > Affiliate
> > >> > > > > > via
> > >> > > > > > > RT
> > >> > > > > > > > <met_help at ucar.edu> wrote:
> > >> > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > Wed Apr 18 06:31:39 2018: Request 84822 was
acted
> upon.
> > >> > > > > > > > > Transaction: Ticket created by
> > >> rosalyn.maccracken at noaa.gov
> > >> > > > > > > > >        Queue: met_help
> > >> > > > > > > > >      Subject: question on regenerating data
> > >> > > > > > > > >        Owner: Nobody
> > >> > > > > > > > >   Requestors: rosalyn.maccracken at noaa.gov
> > >> > > > > > > > >       Status: new
> > >> > > > > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> > >> > > > > > Ticket/Display.html?id=84822
> > >> > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > Hi,
> > >> > > > > > > > >
> > >> > > > > > > > > I'm running point-stat using ASCAT and GFS data
to
> > verify
> > >> > > surface
> > >> > > > > > wind
> > >> > > > > > > > > speeds.  I found an error in my ASCAT input
data that
> > goes
> > >> > back
> > >> > > > to
> > >> > > > > > Mar
> > >> > > > > > > 7.
> > >> > > > > > > > > I had switched the input source of the data,
and
> within
> > >> the
> > >> > new
> > >> > > > > data
> > >> > > > > > > > files,
> > >> > > > > > > > > it was allowing very small values (< 1 m/s) to
be used
> > as
> > >> > data
> > >> > > > > points
> > >> > > > > > > in
> > >> > > > > > > > > the verification.  I imagine that this is an
issue,
> > since
> > >> > > > > point-stat
> > >> > > > > > is
> > >> > > > > > > > > using these very small values as matched pairs
with
> the
> > >> GFS,
> > >> > > > > correct?
> > >> > > > > > > > >
> > >> > > > > > > > > Is there a way to regenerate the point-stat
statistics
> > >> > without
> > >> > > > > using
> > >> > > > > > > the
> > >> > > > > > > > > original GFS data?  I do have the *stat and the
*mpr
> > >> files,
> > >> > and
> > >> > > > it
> > >> > > > > is
> > >> > > > > > > > > pretty easy to identify where the bad values
are
> > located.
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks,
> > >> > > > > > > > > Roz
> > >> > > > > > > > >
> > >> > > > > > > > > --
> > >> > > > > > > > > Rosalyn MacCracken
> > >> > > > > > > > > Support Scientist
> > >> > > > > > > > >
> > >> > > > > > > > > Ocean Applications Branch
> > >> > > > > > > > > NOAA/NWS Ocean Prediction Center
> > >> > > > > > > > > NCWCP
> > >> > > > > > > > > 5830 University Research Ct
> > >> > > > > > > > > College Park, MD  20740-3818
> > >> > > > > > > > >
> > >> > > > > > > > > (p) 301-683-1551
> > >> > > > > > > > > rosalyn.maccracken at noaa.gov
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Rosalyn MacCracken
> > >> > > > > > Support Scientist
> > >> > > > > >
> > >> > > > > > Ocean Applications Branch
> > >> > > > > > NOAA/NWS Ocean Prediction Center
> > >> > > > > > NCWCP
> > >> > > > > > 5830 University Research Ct
> > >> > > > > > College Park, MD  20740-3818
> > >> > > > > >
> > >> > > > > > (p) 301-683-1551
> > >> > > > > > rosalyn.maccracken at noaa.gov
> > >> > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Rosalyn MacCracken
> > >> > > > Support Scientist
> > >> > > >
> > >> > > > Ocean Applications Branch
> > >> > > > NOAA/NWS Ocean Prediction Center
> > >> > > > NCWCP
> > >> > > > 5830 University Research Ct
> > >> > > > College Park, MD  20740-3818
> > >> > > >
> > >> > > > (p) 301-683-1551
> > >> > > > rosalyn.maccracken at noaa.gov
> > >> > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Rosalyn MacCracken
> > >> > Support Scientist
> > >> >
> > >> > Ocean Applications Branch
> > >> > NOAA/NWS Ocean Prediction Center
> > >> > NCWCP
> > >> > 5830 University Research Ct
> > >> > College Park, MD  20740-3818
> > >> >
> > >> > (p) 301-683-1551
> > >> > rosalyn.maccracken at noaa.gov
> > >> >
> > >> >
> > >>
> > >>
> > >
> > >
> > > --
> > > Rosalyn MacCracken
> > > Support Scientist
> > >
> > > Ocean Applications Branch
> > > NOAA/NWS Ocean Prediction Center
> > > NCWCP
> > > 5830 University Research Ct
> > > College Park, MD  20740-3818
> > >
> > > (p) 301-683-1551
> > > rosalyn.maccracken at noaa.gov
> > >
> >
> >
> >
> > --
> > Rosalyn MacCracken
> > Support Scientist
> >
> > Ocean Applications Branch
> > NOAA/NWS Ocean Prediction Center
> > NCWCP
> > 5830 University Research Ct
> > College Park, MD  20740-3818
> >
> > (p) 301-683-1551
> > rosalyn.maccracken at noaa.gov
> >
> >
>
>


--
Rosalyn MacCracken
Support Scientist

Ocean Applications Branch
NOAA/NWS Ocean Prediction Center
NCWCP
5830 University Research Ct
College Park, MD  20740-3818

(p) 301-683-1551
rosalyn.maccracken at noaa.gov

------------------------------------------------
Subject: question on regenerating data
From: John Halley Gotway
Time: Wed Apr 25 09:40:38 2018

Roz,

I think it'd take just as long.  The slow part is reading the data...
not
applying a threshold.

John

On Wed, Apr 25, 2018 at 9:18 AM, Rosalyn MacCracken - NOAA Affiliate
via RT
<met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>
> Hi John,
>
> Thanks for doing that for me.  I'll take a look at the info you sent
me
> this afternoon.  I'm in the middle of doing something right
now...trying to
> make a different program work.  ;-/
>
> I wonder if it will be quicker than 18 minutes for some of the
thresholds
> that have higher wind speeds, and not as many instances (or 0
instances).
> Or, will it take just as long, since it still needs to read through
the
> entire *.stat file anyway?
>
> Roz
>
> On Tue, Apr 24, 2018 at 7:06 PM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
> > Hi Roz,
> >
> > Thanks for sending the sample data.  I grabbed it and used it run
some
> > sample jobs:
> >
> > time /d1/johnhg/MET/MET_releases/met-6.0/bin/stat_analysis \
> > -lookin
> > /d1/johnhg/MET/MET_Help/maccracken_data_20180424/opc_
> > test/home/opc_test/data/met_verif/GFS/data/hourly
> > \
> > -config STATAnalysisConfig \
> > -log run_sa.log -v 3
> >
> > I used the "-lookin" option to point to all the data you sent.
> >
> > I've attached the...
> > (1) config file I used
> > (2) log file that was genrated
> > (3) output .stat files
> >
> > Looking at the jobs, you'll see that I've included 5 of them...
> > - Generate CNT output
> > - Generate CTC >= 0.0 output
> > - Generate CTS >= 0.0 output
> > - Generate CTC >= 5.5689 output
> > - Generate CTS >= 5.5689 output
> >
> > Unfortunately, you'll need to define separate jobs for each
threshold
> you'd
> > like to use.  Although, you shouldn't use >=0.0 since that's
always true.
> >
> > Also unfortunately, this is pretty slow.  On my machine, it took
like 18
> > minutes for these 5 jobs!
> >
> > Thanks,
> > John
> >
> >
> > On Tue, Apr 24, 2018 at 2:09 PM, Rosalyn MacCracken - NOAA
Affiliate via
> RT
> > <met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > >
> > > Hi John,
> > >
> > > I put my file on the ftp site.  Let me know what you find.
You'll see
> > > those really low OBS values (0.01, 0.02, and so on).
> > >
> > > Thanks!
> > >
> > > Roz
> > >
> > > On Tue, Apr 24, 2018 at 2:53 PM, Rosalyn MacCracken - NOAA
Affiliate <
> > > rosalyn.maccracken at noaa.gov> wrote:
> > >
> > > > Ok, I'll get that over to the ftp site.  I have to make sure
that I
> > find
> > > a
> > > > day that has all the data in it.  Sometimes the data isn't
available
> > when
> > > > the script runs.  A little annoying, but, that's operations...
> > > >
> > > > I'll let you know when I get the file to the ftp site.
> > > >
> > > > Thanks!
> > > >
> > > > Roz
> > > >
> > > > On Tue, Apr 24, 2018 at 2:49 PM, John Halley Gotway via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > >> Roz,
> > > >>
> > > >> Yes, we do.  Follow the instructions here:
> > > >>    https://dtcenter.org/met/users/support/met_help.php#ftp
> > > >>
> > > >> I'd suggest making a tar file for one day and posting them to
the
> ftp
> > > >> site:
> > > >>    tar -cvzf sample.tar.gz /GFS/data/hourly/20180305*
> > > >>
> > > >> Thanks,
> > > >> John
> > > >>
> > > >> On Tue, Apr 24, 2018 at 11:57 AM, Rosalyn MacCracken - NOAA
> Affiliate
> > > via
> > > >> RT <met_help at ucar.edu> wrote:
> > > >>
> > > >> >
> > > >> > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > > >> >
> > > >> > HI John,
> > > >> >
> > > >> > Yes, it does seem that the -config option is the way to go
to
> > recreate
> > > >> > those 3 files. I'll be sure to have a unique file name, or,
mv the
> > > >> output
> > > >> > file to a different name before running the command again.
Thanks
> > for
> > > >> > pointing that out.
> > > >> >
> > > >> > I'm teleworking for the next couple of weeks, so, download
and
> send
> > > you
> > > >> > *.stat files like I can when I'm at my computer at work.  I
don't
> > have
> > > >> > access to theia or wcoss anymore.  You have an ftp server
that I
> can
> > > >> upload
> > > >> > data to, right?  If not, I can try and fiddle around with
this
> > > tomorrow
> > > >> and
> > > >> > see if I can't get this to work the way I want to.
> > > >> >
> > > >> > Roz
> > > >> >
> > > >> > On Tue, Apr 24, 2018 at 11:42 AM, John Halley Gotway via RT
<
> > > >> > met_help at ucar.edu> wrote:
> > > >> >
> > > >> > > Roz,
> > > >> > >
> > > >> > > Each "-job aggregate_stat" only generates a single output
line
> > type.
> > > >> So
> > > >> > > using "-out_line_type CTC,CTS,CNT" will not work.
> > > >> > >
> > > >> > > You'll need to run separate jobs for each output line
type you
> > want
> > > to
> > > >> > > generate.  That's why I'd recommend grouping those
multiple jobs
> > > >> together
> > > >> > > into a single STAT-Analysis config file.  Then you'd call
> > > >> STAT-Analysis
> > > >> > > once using the "-config" command line option.
> > > >> > >
> > > >> > > Another issue is that if you set "-out_stat" to the same
> filename,
> > > >> it'll
> > > >> > > get overridden by each job.  STAT-Analysis will overwrite
that
> > > output
> > > >> > file
> > > >> > > rather than appending to it.
> > > >> > >
> > > >> > > You could send me a day's worth of .stat output files
> > > >> > > (/GFS/data/hourly/20180305*) and I could send you some
> > suggestions.
> > > >> Or
> > > >> > if
> > > >> > > you have access to theia you could copy them up there and
point
> me
> > > to
> > > >> it.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > John
> > > >> > >
> > > >> > > On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken -
NOAA
> > Affiliate
> > > >> via
> > > >> > RT
> > > >> > > <met_help at ucar.edu> wrote:
> > > >> > >
> > > >> > > >
> > > >> > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
> >
> > > >> > > >
> > > >> > > > Hi John,
> > > >> > > >
> > > >> > > > Yes, that makes sense.  Those very small values (<1.0
m/s),
> are
> > > bad
> > > >> > > > values.  That's why they shouldn't be included in the
> > processing.
> > > >> > > >
> > > >> > > > So, I need to just regenerate hourly data, one hour at
a time.
> > > >> Would
> > > >> > it
> > > >> > > > make sense to use a shell script and loop stat-
analysis?
> > > Something
> > > >> > like:
> > > >> > > >
> > > >> > > > for day in 11 12
> > > >> > > > do
> > > >> > > >   for cycle in 00 06 12 18
> > > >> > > >   do
> > > >> > > > stat_analysis -lookin /GFS/data/hourly/201803${day}$
> > {hour}/*.stat
> > > \
> > > >> > > > -job aggregate_stat \
> > > >> > > >    -line_type MPR \
> > > >> > > >    -out_line_type CTC,CTS,CNT \
> > > >> > > >   -fcst_var WIND \
> > > >> > > > -column_thresh OBS gt1 \
> > > >> > > >  -by
> > > >> > > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,I
> > > >> NTERP_PNTS
> > > >> > > > -out_stat /new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
> > > >> > > >   done
> > > >> > > > done
> > > >> > > >
> > > >> > > > or, something like that?  And, will this regenerate
hour
> > > forecasts,
> > > >> at
> > > >> > > each
> > > >> > > > forecast and lead hour?  I guess it will see the
forecast and
> > lead
> > > >> hour
> > > >> > > > from the *.stat file, and whatever *stat file is in the
> > directory,
> > > >> it
> > > >> > > will
> > > >> > > > regenerate those hours, right?
> > > >> > > >
> > > >> > > > So, I need to regenerate the CTC, CNT and CTS files.
That's
> > why I
> > > >> did:
> > > >> > > >  -out_line_type CTC,CTS,CNT
> > > >> > > > but, will that make 3 separate files, or just another
*.stat
> > file?
> > > >> > > >
> > > >> > > > Roz
> > > >> > > >
> > > >> > > >
> > > >> > > > On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway via
RT <
> > > >> > > > met_help at ucar.edu> wrote:
> > > >> > > >
> > > >> > > > > Roz,
> > > >> > > > >
> > > >> > > > > It is ultimately up to you to decide which matched
pairs you
> > > want
> > > >> to
> > > >> > > > > include in your processing.  Do you consider those
small
> (<1.0
> > > >> m/s)
> > > >> > > > > observation values to be corrupt and incorrect in
some way
> or
> > > just
> > > >> > not
> > > >> > > > very
> > > >> > > > > interesting?  If they really are BAD data values, I
agree
> that
> > > you
> > > >> > > should
> > > >> > > > > exclude them from your analysis.  But if they're just
> > > >> uninteresting
> > > >> > > > values
> > > >> > > > > of low wind speed, then there's no reason why you
should
> > exclude
> > > >> > them.
> > > >> > > > For
> > > >> > > > > example, *most* of the time it ins't raining, but we
often
> > > >> included
> > > >> > > > > observations of 0 precip.
> > > >> > > > >
> > > >> > > > > There are three configurable options in Point-Stat
that may
> be
> > > >> useful
> > > >> > > > here:
> > > >> > > > > (1) You already know and use the "cat_thresh" option.
This
> > > >> threshold
> > > >> > > > > defines the events and non-events for a 2x2
contingency
> table.
> > > >> This
> > > >> > > > > threshold affects the contents of FHO, CTC, CTS,
MCTC, and
> > MCTS
> > > >> line
> > > >> > > > types
> > > >> > > > > that Point-Stat writes.
> > > >> > > > > (2) The "cnt_thresh" option is a more recent
addition.
> > Perhaps
> > > >> this
> > > >> > > was
> > > >> > > > a
> > > >> > > > > poor name choice, but instead of defining categories,
it's
> > > really
> > > >> a
> > > >> > > > > *filtering* threshold.  This threshold affects the
contents
> of
> > > the
> > > >> > > SL1L2,
> > > >> > > > > SAL1L2, and CNT line types that Point-Stat writes.
For
> > example,
> > > >> > > setting
> > > >> > > > > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT and
2 SL1L2
> > > >> output
> > > >> > > lines
> > > >> > > > > containing only those points where the wind speed was
>=6
> and
> > > >> >=17,
> > > >> > > > > respectively.
> > > >> > > > > (3) The "wind_thresh" option is very similar to the
> > "cnt_thresh"
> > > >> > option
> > > >> > > > but
> > > >> > > > > affects the contents of teh VL1L2, VAL1L2, and VCNT
(new in
> > > >> met-7.0)
> > > >> > > line
> > > >> > > > > types.  Only those U/V pairs that meet the specified
wind
> > speed
> > > >> > > threshold
> > > >> > > > > are included in the output.
> > > >> > > > >
> > > >> > > > > For both "cnt_thresh" and "wind_thresh", the default
value
> in
> > > the
> > > >> > > config
> > > >> > > > > file is "NA", meaning, do not apply any filtering
threshold
> > > >> criteria.
> > > >> > > > >
> > > >> > > > > You have the flexibility to run STAT-Analysis on the
MPR
> > output
> > > >> lines
> > > >> > > to
> > > >> > > > > recompute any of these output line types applying
whatever
> > > >> filtering
> > > >> > > > > criteria you'd like.
> > > >> > > > > Here's the MET user's guide:
> > > >> > > > > https://dtcenter.org/met/users/docs/users_guide/MET_
> > > >> > > Users_Guide_v7.0.pdf
> > > >> > > > > Look on page 98 for the job command options for the
> > > >> "aggregate_stat"
> > > >> > > line
> > > >> > > > > type when the input line type is "MPR".
> > > >> > > > >
> > > >> > > > > For your second question, the "-lookin PATH" option
is
> *VERY*
> > > >> > flexible.
> > > >> > > > > You can set PATH to either a single value or multiple
> values.
> > > If
> > > >> you
> > > >> > > use
> > > >> > > > > wildcards, then the shell expands those wildcards to
> multiple
> > > >> values.
> > > >> > > > Each
> > > >> > > > > value you pass in can either be a filename or a
directory
> > name.
> > > >> If
> > > >> > you
> > > >> > > > > pass in a filename, STAT-Analysis will read it
*REGARDLESS*
> of
> > > the
> > > >> > file
> > > >> > > > > extension.  If you pass in a directory name, STAT-
Analysis
> > will
> > > >> > search
> > > >> > > > that
> > > >> > > > > directory *RECURSIVELY* for files ending in ".stat".
For
> > > example,
> > > >> > > either
> > > >> > > > > of the following settings would tell STAT-Analysis to
read
> the
> > > >> same
> > > >> > > list
> > > >> > > > of
> > > >> > > > > files:
> > > >> > > > >    -lookin /GFS/data/hourly/*/*.stat
> > > >> > > > >    ... or ...
> > > >> > > > >    -lookin /GFS/data/hourly
> > > >> > > > >
> > > >> > > > > Be aware though that the more data you pass to
> STAT-Analysis,
> > > the
> > > >> > > longer
> > > >> > > > > it'll take for it to process it.  You can decide how
much
> data
> > > you
> > > >> > pass
> > > >> > > > it
> > > >> > > > > for each job.  I'd suggest starting with what is most
> > convenient
> > > >> for
> > > >> > > you.
> > > >> > > > > If it's too slow, change the logic to pass it less
data
> (e.g.
> > > >> only 1
> > > >> > > day
> > > >> > > > of
> > > >> > > > > data rather than 1 month of data).
> > > >> > > > >
> > > >> > > > > Yes, you can give it a date range.  Use
-fcst_init_beg and
> > > >> > > -fcst_init_end
> > > >> > > > > to specify beginning/ending model initialization
times or
> > > >> > > -fcst_valid_beg
> > > >> > > > > and -fcst_valid_end to specify beginning/ending valid
times.
> > > >> > > > >
> > > >> > > > > If you find that you're running multiple jobs on the
same
> > subset
> > > >> of
> > > >> > > data
> > > >> > > > > (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC,
MPR to
> > CTS),
> > > >> it'd
> > > >> > > be
> > > >> > > > > more efficient to group those jobs into a config
file.
> > That'll
> > > do
> > > >> > the
> > > >> > > > > filtering ONCE and write the filtered data to a temp
file.
> > Then
> > > >> all
> > > >> > > the
> > > >> > > > > jobs read data from the temp instead of starting over
from
> > > >> scratch.
> > > >> > > > >
> > > >> > > > > Make sense?
> > > >> > > > >
> > > >> > > > > John
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken -
NOAA
> > > >> Affiliate
> > > >> > > via
> > > >> > > > RT
> > > >> > > > > <met_help at ucar.edu> wrote:
> > > >> > > > >
> > > >> > > > > >
> > > >> > > > > > <URL: https://rt.rap.ucar.edu/rt/
> > Ticket/Display.html?id=84822
> > > >
> > > >> > > > > >
> > > >> > > > > > Hi John,
> > > >> > > > > >
> > > >> > > > > > That's actually only partially correct.  It's not
that I
> > want
> > > to
> > > >> > use
> > > >> > > > part
> > > >> > > > > > of the MPR lines and discard the rest, and I do
need to
> > > >> regenerate
> > > >> > > > > > statistics.  Let me try to re-explain.
> > > >> > > > > >
> > > >> > > > > > Back in early March we switched from getting our
ASCAT obs
> > > from
> > > >> the
> > > >> > > > > > prepbufr data, to getting it from the MGDRLITE
data. So,
> > > >> processing
> > > >> > > > > didn't
> > > >> > > > > > change.  I was producing statistics at certain
threshold
> > > levels
> > > >> for
> > > >> > > > both
> > > >> > > > > > GFS and ASCAT.  I had this set with the cat_thresh
list,
> at
> > > >> levels
> > > >> > of
> > > >> > > > > > 0,6,17, etc.  We found out after processing for a
couple
> of
> > > >> weeks
> > > >> > > that
> > > >> > > > > the
> > > >> > > > > > ASCAT data included these really small values, <1.0
m/s,
> and
> > > >> that
> > > >> > > these
> > > >> > > > > > small wind speeds were being included into the
statistics
> > > >> > processing.
> > > >> > > > > >
> > > >> > > > > > So, a couple of questions.
> > > >> > > > > > 1) Do I have to regenerate all of my statistics
(*.cts,
> > *.cnt
> > > >> and
> > > >> > > *ctc
> > > >> > > > > > files) because of this error? Or, since I have
threshold
> > > levels
> > > >> > set,
> > > >> > > > will
> > > >> > > > > > those small values be amoung the statistics in the
lowest
> > > >> > thresholds?
> > > >> > > > > > 2) I have the *.stat files, but, they are spread
out into
> > > >> separate
> > > >> > > > > > directories like:
> > > >> > > > > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> > > >> > > > > > Can I tell stat-analysis to "lookin" directories
with a
> > > wildcard
> > > >> > > (like
> > > >> > > > > > 201803*)?  If so, how?  Or, is I tell it to look in
> > > >> > /GFS/data/hourly,
> > > >> > > > > will
> > > >> > > > > > it look in all the directories recursively under
hourly?
> > And,
> > > >> it
> > > >> > > > that's
> > > >> > > > > > the case, can I give it a date range, so, that it
only
> > > processes
> > > >> > data
> > > >> > > > > from
> > > >> > > > > > March?
> > > >> > > > > >
> > > >> > > > > > Roz
> > > >> > > > > >
> > > >> > > > > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley Gotway
via
> RT <
> > > >> > > > > > met_help at ucar.edu> wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Roz,
> > > >> > > > > > >
> > > >> > > > > > > I read that you've run Point-Stat and saved off
the
> > matched
> > > >> pairs
> > > >> > > > (MPR)
> > > >> > > > > > > output line type.  And you'd like to (1) filter
those
> MPR
> > > >> lines
> > > >> > to
> > > >> > > > > > discard
> > > >> > > > > > > some of them and then (2) use the filtered data
to
> > > regenerate
> > > >> > > summary
> > > >> > > > > > > statistics.  Yes, this is easily done using the
> > > STAT-Analysis
> > > >> > tool
> > > >> > > in
> > > >> > > > > > MET.
> > > >> > > > > > >
> > > >> > > > > > > You wrote that you're verifying wind speeds
against
> ASCAT
> > > and
> > > >> > that
> > > >> > > > > you'd
> > > >> > > > > > > like to exclude pairs where the observed wind
speed is
> > less
> > > >> than
> > > >> > 1
> > > >> > > > m/s.
> > > >> > > > > > > I'm just guessing here, but I'll presume that you
want
> to
> > > >> produce
> > > >> > > > both
> > > >> > > > > > > SL1L2 and CNT output line types.  Here's what the
> > > >> STAT-Analysis
> > > >> > job
> > > >> > > > > would
> > > >> > > > > > > look like:
> > > >> > > > > > >
> > > >> > > > > > > # Filter MPR's and write SL1L2 output line
> > > >> > > > > > > stat_analysis \
> > > >> > > > > > >    -lookin input.stat \            # List a .stat
> filename
> > > or
> > > >> > > > directory
> > > >> > > > > > > containing them
> > > >> > > > > > >    -job aggregate_stat \        # Job type is
> > aggregate_stat
> > > >> > > > > > >    -line_type MPR \              # Input line
type = MPR
> > > >> > > > > > >    -out_line_type SL1L2 \      # Output line type
=
> SL1L2
> > > >> partial
> > > >> > > > sums
> > > >> > > > > > >    -fcst_var WIND \               # Only process
lines
> > where
> > > >> > > FCST_VAR
> > > >> > > > > > > column = WIND
> > > >> > > > > > >    -column_thresh OBS gt1 \ # Only use MPR lines
where
> OBS
> > > >> column
> > > >> > > > 1
> > > >> > > > > > >    -by
> > > >> > > > > > > MODEL,FCST_LEV,FCST_INIT_BEG,
> > FCST_LEAD,VX_MASK,INTERP_MTHD,
> > > >> > > > INTERP_PNTS
> > > >> > > > > #
> > > >> > > > > > > Run this same job for each unique combination of
these
> > > columns
> > > >> > > > > > >    -out_stat MPR_to_SL1L2.stat
> > > >> > > > > > >
> > > >> > > > > > > This will read produce an output .stat file
containing
> an
> > > >> SL1L2
> > > >> > > line
> > > >> > > > > for
> > > >> > > > > > > each unique combination of the header columns
listed
> after
> > > the
> > > >> > > "-by"
> > > >> > > > > > > option.  To generate CNT output lines instead,
you'd
> run a
> > > >> second
> > > >> > > job
> > > >> > > > > > where
> > > >> > > > > > > you replace SL1L2 with CNT.  You could run these
jobs on
> > the
> > > >> > > command
> > > >> > > > > line
> > > >> > > > > > > or group them together into a STAT-Analysis
config file,
> > if
> > > >> you
> > > >> > > > prefer.
> > > >> > > > > > > Both would work.
> > > >> > > > > > >
> > > >> > > > > > > You could run this once for each input .stat file
you're
> > > >> > > > processing...
> > > >> > > > > or
> > > >> > > > > > > you could pass many input .stat files to the job.
Since
> > > >> > > > FCST_INIT_BEG
> > > >> > > > > > and
> > > >> > > > > > > FCST_LEAD are included in the "-by" option,
you'll get
> > > >> separate
> > > >> > > > output
> > > >> > > > > > > lines for each unique time.
> > > >> > > > > > >
> > > >> > > > > > > Hope that helps get you going.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > > John
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie Prestopnik
via
> RT <
> > > >> > > > > > > met_help at ucar.edu>
> > > >> > > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > <URL: https://rt.rap.ucar.edu/rt/Tic
> > > >> ket/Display.html?id=84822
> > > >> > >
> > > >> > > > > > > >
> > > >> > > > > > > > Hi Roz.  My apologies for the delay in
responding.
> > > >> > > > > > > >
> > > >> > > > > > > > Unfortunately, John is out of the office this
week,
> and
> > I
> > > do
> > > >> > not
> > > >> > > > know
> > > >> > > > > > the
> > > >> > > > > > > > answers to your questions.  As you said, I
would also
> > > >> imagine
> > > >> > > that
> > > >> > > > > > > > point-stat is using those small values as
matched
> pairs.
> > > >> > Also, I
> > > >> > > > do
> > > >> > > > > > not
> > > >> > > > > > > > believe there is a way to regenerate the point-
stat
> > > >> statistics
> > > >> > > > > without
> > > >> > > > > > > > using the original GFS data.  I cannot say with
> > certainty,
> > > >> > > however.
> > > >> > > > > > > Thank
> > > >> > > > > > > > you for your patience in advance.  We'll get a
> definite
> > > >> > response
> > > >> > > to
> > > >> > > > > you
> > > >> > > > > > > as
> > > >> > > > > > > > soon as we can.
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > Julie
> > > >> > > > > > > >
> > > >> > > > > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn
MacCracken -
> > NOAA
> > > >> > > > Affiliate
> > > >> > > > > > via
> > > >> > > > > > > RT
> > > >> > > > > > > > <met_help at ucar.edu> wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > Wed Apr 18 06:31:39 2018: Request 84822 was
acted
> > upon.
> > > >> > > > > > > > > Transaction: Ticket created by
> > > >> rosalyn.maccracken at noaa.gov
> > > >> > > > > > > > >        Queue: met_help
> > > >> > > > > > > > >      Subject: question on regenerating data
> > > >> > > > > > > > >        Owner: Nobody
> > > >> > > > > > > > >   Requestors: rosalyn.maccracken at noaa.gov
> > > >> > > > > > > > >       Status: new
> > > >> > > > > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > >> > > > > > Ticket/Display.html?id=84822
> > > >> > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > Hi,
> > > >> > > > > > > > >
> > > >> > > > > > > > > I'm running point-stat using ASCAT and GFS
data to
> > > verify
> > > >> > > surface
> > > >> > > > > > wind
> > > >> > > > > > > > > speeds.  I found an error in my ASCAT input
data
> that
> > > goes
> > > >> > back
> > > >> > > > to
> > > >> > > > > > Mar
> > > >> > > > > > > 7.
> > > >> > > > > > > > > I had switched the input source of the data,
and
> > within
> > > >> the
> > > >> > new
> > > >> > > > > data
> > > >> > > > > > > > files,
> > > >> > > > > > > > > it was allowing very small values (< 1 m/s)
to be
> used
> > > as
> > > >> > data
> > > >> > > > > points
> > > >> > > > > > > in
> > > >> > > > > > > > > the verification.  I imagine that this is an
issue,
> > > since
> > > >> > > > > point-stat
> > > >> > > > > > is
> > > >> > > > > > > > > using these very small values as matched
pairs with
> > the
> > > >> GFS,
> > > >> > > > > correct?
> > > >> > > > > > > > >
> > > >> > > > > > > > > Is there a way to regenerate the point-stat
> statistics
> > > >> > without
> > > >> > > > > using
> > > >> > > > > > > the
> > > >> > > > > > > > > original GFS data?  I do have the *stat and
the *mpr
> > > >> files,
> > > >> > and
> > > >> > > > it
> > > >> > > > > is
> > > >> > > > > > > > > pretty easy to identify where the bad values
are
> > > located.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks,
> > > >> > > > > > > > > Roz
> > > >> > > > > > > > >
> > > >> > > > > > > > > --
> > > >> > > > > > > > > Rosalyn MacCracken
> > > >> > > > > > > > > Support Scientist
> > > >> > > > > > > > >
> > > >> > > > > > > > > Ocean Applications Branch
> > > >> > > > > > > > > NOAA/NWS Ocean Prediction Center
> > > >> > > > > > > > > NCWCP
> > > >> > > > > > > > > 5830 University Research Ct
> > > >> > > > > > > > > College Park, MD  20740-3818
> > > >> > > > > > > > >
> > > >> > > > > > > > > (p) 301-683-1551
> > > >> > > > > > > > > rosalyn.maccracken at noaa.gov
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > --
> > > >> > > > > > Rosalyn MacCracken
> > > >> > > > > > Support Scientist
> > > >> > > > > >
> > > >> > > > > > Ocean Applications Branch
> > > >> > > > > > NOAA/NWS Ocean Prediction Center
> > > >> > > > > > NCWCP
> > > >> > > > > > 5830 University Research Ct
> > > >> > > > > > College Park, MD  20740-3818
> > > >> > > > > >
> > > >> > > > > > (p) 301-683-1551
> > > >> > > > > > rosalyn.maccracken at noaa.gov
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > Rosalyn MacCracken
> > > >> > > > Support Scientist
> > > >> > > >
> > > >> > > > Ocean Applications Branch
> > > >> > > > NOAA/NWS Ocean Prediction Center
> > > >> > > > NCWCP
> > > >> > > > 5830 University Research Ct
> > > >> > > > College Park, MD  20740-3818
> > > >> > > >
> > > >> > > > (p) 301-683-1551
> > > >> > > > rosalyn.maccracken at noaa.gov
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Rosalyn MacCracken
> > > >> > Support Scientist
> > > >> >
> > > >> > Ocean Applications Branch
> > > >> > NOAA/NWS Ocean Prediction Center
> > > >> > NCWCP
> > > >> > 5830 University Research Ct
> > > >> > College Park, MD  20740-3818
> > > >> >
> > > >> > (p) 301-683-1551
> > > >> > rosalyn.maccracken at noaa.gov
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > Rosalyn MacCracken
> > > > Support Scientist
> > > >
> > > > Ocean Applications Branch
> > > > NOAA/NWS Ocean Prediction Center
> > > > NCWCP
> > > > 5830 University Research Ct
> > > > College Park, MD  20740-3818
> > > >
> > > > (p) 301-683-1551
> > > > rosalyn.maccracken at noaa.gov
> > > >
> > >
> > >
> > >
> > > --
> > > Rosalyn MacCracken
> > > Support Scientist
> > >
> > > Ocean Applications Branch
> > > NOAA/NWS Ocean Prediction Center
> > > NCWCP
> > > 5830 University Research Ct
> > > College Park, MD  20740-3818
> > >
> > > (p) 301-683-1551
> > > rosalyn.maccracken at noaa.gov
> > >
> > >
> >
> >
>
>
> --
> Rosalyn MacCracken
> Support Scientist
>
> Ocean Applications Branch
> NOAA/NWS Ocean Prediction Center
> NCWCP
> 5830 University Research Ct
> College Park, MD  20740-3818
>
> (p) 301-683-1551
> rosalyn.maccracken at noaa.gov
>
>

------------------------------------------------
Subject: question on regenerating data
From: Rosalyn MacCracken - NOAA Affiliate
Time: Wed Apr 25 10:08:49 2018

Figures.  I just calculated how long it will take me to regenerate
data for
03072018 - 04122018.  It will take me 912 hours.  ;-(

Ok, I know I asked this, but, if I had a OBS value of 0.01 and a
matched
GFS point of 10 m/s, and I had a low threshold of 0-5 m/s, 6-10 m/s
and
10-15 m/s, and say, CSI was calculated.  Which threshold would be used
for
the output, the 0-5 or 6-10?  And, would the 10-15 threshold even be
effected?

Roz

On Wed, Apr 25, 2018 at 11:40 AM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:

> Roz,
>
> I think it'd take just as long.  The slow part is reading the
data... not
> applying a threshold.
>
> John
>
> On Wed, Apr 25, 2018 at 9:18 AM, Rosalyn MacCracken - NOAA Affiliate
via RT
> <met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> >
> > Hi John,
> >
> > Thanks for doing that for me.  I'll take a look at the info you
sent me
> > this afternoon.  I'm in the middle of doing something right
now...trying
> to
> > make a different program work.  ;-/
> >
> > I wonder if it will be quicker than 18 minutes for some of the
thresholds
> > that have higher wind speeds, and not as many instances (or 0
instances).
> > Or, will it take just as long, since it still needs to read
through the
> > entire *.stat file anyway?
> >
> > Roz
> >
> > On Tue, Apr 24, 2018 at 7:06 PM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> >
> > > Hi Roz,
> > >
> > > Thanks for sending the sample data.  I grabbed it and used it
run some
> > > sample jobs:
> > >
> > > time /d1/johnhg/MET/MET_releases/met-6.0/bin/stat_analysis \
> > > -lookin
> > > /d1/johnhg/MET/MET_Help/maccracken_data_20180424/opc_
> > > test/home/opc_test/data/met_verif/GFS/data/hourly
> > > \
> > > -config STATAnalysisConfig \
> > > -log run_sa.log -v 3
> > >
> > > I used the "-lookin" option to point to all the data you sent.
> > >
> > > I've attached the...
> > > (1) config file I used
> > > (2) log file that was genrated
> > > (3) output .stat files
> > >
> > > Looking at the jobs, you'll see that I've included 5 of them...
> > > - Generate CNT output
> > > - Generate CTC >= 0.0 output
> > > - Generate CTS >= 0.0 output
> > > - Generate CTC >= 5.5689 output
> > > - Generate CTS >= 5.5689 output
> > >
> > > Unfortunately, you'll need to define separate jobs for each
threshold
> > you'd
> > > like to use.  Although, you shouldn't use >=0.0 since that's
always
> true.
> > >
> > > Also unfortunately, this is pretty slow.  On my machine, it took
like
> 18
> > > minutes for these 5 jobs!
> > >
> > > Thanks,
> > > John
> > >
> > >
> > > On Tue, Apr 24, 2018 at 2:09 PM, Rosalyn MacCracken - NOAA
Affiliate
> via
> > RT
> > > <met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
>
> > > >
> > > > Hi John,
> > > >
> > > > I put my file on the ftp site.  Let me know what you find.
You'll
> see
> > > > those really low OBS values (0.01, 0.02, and so on).
> > > >
> > > > Thanks!
> > > >
> > > > Roz
> > > >
> > > > On Tue, Apr 24, 2018 at 2:53 PM, Rosalyn MacCracken - NOAA
Affiliate
> <
> > > > rosalyn.maccracken at noaa.gov> wrote:
> > > >
> > > > > Ok, I'll get that over to the ftp site.  I have to make sure
that I
> > > find
> > > > a
> > > > > day that has all the data in it.  Sometimes the data isn't
> available
> > > when
> > > > > the script runs.  A little annoying, but, that's
operations...
> > > > >
> > > > > I'll let you know when I get the file to the ftp site.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Roz
> > > > >
> > > > > On Tue, Apr 24, 2018 at 2:49 PM, John Halley Gotway via RT <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > >> Roz,
> > > > >>
> > > > >> Yes, we do.  Follow the instructions here:
> > > > >>    https://dtcenter.org/met/users/support/met_help.php#ftp
> > > > >>
> > > > >> I'd suggest making a tar file for one day and posting them
to the
> > ftp
> > > > >> site:
> > > > >>    tar -cvzf sample.tar.gz /GFS/data/hourly/20180305*
> > > > >>
> > > > >> Thanks,
> > > > >> John
> > > > >>
> > > > >> On Tue, Apr 24, 2018 at 11:57 AM, Rosalyn MacCracken - NOAA
> > Affiliate
> > > > via
> > > > >> RT <met_help at ucar.edu> wrote:
> > > > >>
> > > > >> >
> > > > >> > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > > > >> >
> > > > >> > HI John,
> > > > >> >
> > > > >> > Yes, it does seem that the -config option is the way to
go to
> > > recreate
> > > > >> > those 3 files. I'll be sure to have a unique file name,
or, mv
> the
> > > > >> output
> > > > >> > file to a different name before running the command
again.
> Thanks
> > > for
> > > > >> > pointing that out.
> > > > >> >
> > > > >> > I'm teleworking for the next couple of weeks, so,
download and
> > send
> > > > you
> > > > >> > *.stat files like I can when I'm at my computer at work.
I
> don't
> > > have
> > > > >> > access to theia or wcoss anymore.  You have an ftp server
that I
> > can
> > > > >> upload
> > > > >> > data to, right?  If not, I can try and fiddle around with
this
> > > > tomorrow
> > > > >> and
> > > > >> > see if I can't get this to work the way I want to.
> > > > >> >
> > > > >> > Roz
> > > > >> >
> > > > >> > On Tue, Apr 24, 2018 at 11:42 AM, John Halley Gotway via
RT <
> > > > >> > met_help at ucar.edu> wrote:
> > > > >> >
> > > > >> > > Roz,
> > > > >> > >
> > > > >> > > Each "-job aggregate_stat" only generates a single
output line
> > > type.
> > > > >> So
> > > > >> > > using "-out_line_type CTC,CTS,CNT" will not work.
> > > > >> > >
> > > > >> > > You'll need to run separate jobs for each output line
type you
> > > want
> > > > to
> > > > >> > > generate.  That's why I'd recommend grouping those
multiple
> jobs
> > > > >> together
> > > > >> > > into a single STAT-Analysis config file.  Then you'd
call
> > > > >> STAT-Analysis
> > > > >> > > once using the "-config" command line option.
> > > > >> > >
> > > > >> > > Another issue is that if you set "-out_stat" to the
same
> > filename,
> > > > >> it'll
> > > > >> > > get overridden by each job.  STAT-Analysis will
overwrite that
> > > > output
> > > > >> > file
> > > > >> > > rather than appending to it.
> > > > >> > >
> > > > >> > > You could send me a day's worth of .stat output files
> > > > >> > > (/GFS/data/hourly/20180305*) and I could send you some
> > > suggestions.
> > > > >> Or
> > > > >> > if
> > > > >> > > you have access to theia you could copy them up there
and
> point
> > me
> > > > to
> > > > >> it.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > John
> > > > >> > >
> > > > >> > > On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken -
NOAA
> > > Affiliate
> > > > >> via
> > > > >> > RT
> > > > >> > > <met_help at ucar.edu> wrote:
> > > > >> > >
> > > > >> > > >
> > > > >> > > > <URL: https://rt.rap.ucar.edu/rt/
> Ticket/Display.html?id=84822
> > >
> > > > >> > > >
> > > > >> > > > Hi John,
> > > > >> > > >
> > > > >> > > > Yes, that makes sense.  Those very small values (<1.0
m/s),
> > are
> > > > bad
> > > > >> > > > values.  That's why they shouldn't be included in the
> > > processing.
> > > > >> > > >
> > > > >> > > > So, I need to just regenerate hourly data, one hour
at a
> time.
> > > > >> Would
> > > > >> > it
> > > > >> > > > make sense to use a shell script and loop stat-
analysis?
> > > > Something
> > > > >> > like:
> > > > >> > > >
> > > > >> > > > for day in 11 12
> > > > >> > > > do
> > > > >> > > >   for cycle in 00 06 12 18
> > > > >> > > >   do
> > > > >> > > > stat_analysis -lookin /GFS/data/hourly/201803${day}$
> > > {hour}/*.stat
> > > > \
> > > > >> > > > -job aggregate_stat \
> > > > >> > > >    -line_type MPR \
> > > > >> > > >    -out_line_type CTC,CTS,CNT \
> > > > >> > > >   -fcst_var WIND \
> > > > >> > > > -column_thresh OBS gt1 \
> > > > >> > > >  -by
> > > > >> > > >
MODEL,FCST_LEV,FCST_INIT_BEG,FCST_LEAD,VX_MASK,INTERP_MTHD,
> I
> > > > >> NTERP_PNTS
> > > > >> > > > -out_stat
/new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
> > > > >> > > >   done
> > > > >> > > > done
> > > > >> > > >
> > > > >> > > > or, something like that?  And, will this regenerate
hour
> > > > forecasts,
> > > > >> at
> > > > >> > > each
> > > > >> > > > forecast and lead hour?  I guess it will see the
forecast
> and
> > > lead
> > > > >> hour
> > > > >> > > > from the *.stat file, and whatever *stat file is in
the
> > > directory,
> > > > >> it
> > > > >> > > will
> > > > >> > > > regenerate those hours, right?
> > > > >> > > >
> > > > >> > > > So, I need to regenerate the CTC, CNT and CTS files.
That's
> > > why I
> > > > >> did:
> > > > >> > > >  -out_line_type CTC,CTS,CNT
> > > > >> > > > but, will that make 3 separate files, or just another
*.stat
> > > file?
> > > > >> > > >
> > > > >> > > > Roz
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway
via RT <
> > > > >> > > > met_help at ucar.edu> wrote:
> > > > >> > > >
> > > > >> > > > > Roz,
> > > > >> > > > >
> > > > >> > > > > It is ultimately up to you to decide which matched
pairs
> you
> > > > want
> > > > >> to
> > > > >> > > > > include in your processing.  Do you consider those
small
> > (<1.0
> > > > >> m/s)
> > > > >> > > > > observation values to be corrupt and incorrect in
some way
> > or
> > > > just
> > > > >> > not
> > > > >> > > > very
> > > > >> > > > > interesting?  If they really are BAD data values, I
agree
> > that
> > > > you
> > > > >> > > should
> > > > >> > > > > exclude them from your analysis.  But if they're
just
> > > > >> uninteresting
> > > > >> > > > values
> > > > >> > > > > of low wind speed, then there's no reason why you
should
> > > exclude
> > > > >> > them.
> > > > >> > > > For
> > > > >> > > > > example, *most* of the time it ins't raining, but
we often
> > > > >> included
> > > > >> > > > > observations of 0 precip.
> > > > >> > > > >
> > > > >> > > > > There are three configurable options in Point-Stat
that
> may
> > be
> > > > >> useful
> > > > >> > > > here:
> > > > >> > > > > (1) You already know and use the "cat_thresh"
option.
> This
> > > > >> threshold
> > > > >> > > > > defines the events and non-events for a 2x2
contingency
> > table.
> > > > >> This
> > > > >> > > > > threshold affects the contents of FHO, CTC, CTS,
MCTC, and
> > > MCTS
> > > > >> line
> > > > >> > > > types
> > > > >> > > > > that Point-Stat writes.
> > > > >> > > > > (2) The "cnt_thresh" option is a more recent
addition.
> > > Perhaps
> > > > >> this
> > > > >> > > was
> > > > >> > > > a
> > > > >> > > > > poor name choice, but instead of defining
categories, it's
> > > > really
> > > > >> a
> > > > >> > > > > *filtering* threshold.  This threshold affects the
> contents
> > of
> > > > the
> > > > >> > > SL1L2,
> > > > >> > > > > SAL1L2, and CNT line types that Point-Stat writes.
For
> > > example,
> > > > >> > > setting
> > > > >> > > > > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT
and 2
> SL1L2
> > > > >> output
> > > > >> > > lines
> > > > >> > > > > containing only those points where the wind speed
was >=6
> > and
> > > > >> >=17,
> > > > >> > > > > respectively.
> > > > >> > > > > (3) The "wind_thresh" option is very similar to the
> > > "cnt_thresh"
> > > > >> > option
> > > > >> > > > but
> > > > >> > > > > affects the contents of teh VL1L2, VAL1L2, and VCNT
(new
> in
> > > > >> met-7.0)
> > > > >> > > line
> > > > >> > > > > types.  Only those U/V pairs that meet the
specified wind
> > > speed
> > > > >> > > threshold
> > > > >> > > > > are included in the output.
> > > > >> > > > >
> > > > >> > > > > For both "cnt_thresh" and "wind_thresh", the
default value
> > in
> > > > the
> > > > >> > > config
> > > > >> > > > > file is "NA", meaning, do not apply any filtering
> threshold
> > > > >> criteria.
> > > > >> > > > >
> > > > >> > > > > You have the flexibility to run STAT-Analysis on
the MPR
> > > output
> > > > >> lines
> > > > >> > > to
> > > > >> > > > > recompute any of these output line types applying
whatever
> > > > >> filtering
> > > > >> > > > > criteria you'd like.
> > > > >> > > > > Here's the MET user's guide:
> > > > >> > > > >
https://dtcenter.org/met/users/docs/users_guide/MET_
> > > > >> > > Users_Guide_v7.0.pdf
> > > > >> > > > > Look on page 98 for the job command options for the
> > > > >> "aggregate_stat"
> > > > >> > > line
> > > > >> > > > > type when the input line type is "MPR".
> > > > >> > > > >
> > > > >> > > > > For your second question, the "-lookin PATH" option
is
> > *VERY*
> > > > >> > flexible.
> > > > >> > > > > You can set PATH to either a single value or
multiple
> > values.
> > > > If
> > > > >> you
> > > > >> > > use
> > > > >> > > > > wildcards, then the shell expands those wildcards
to
> > multiple
> > > > >> values.
> > > > >> > > > Each
> > > > >> > > > > value you pass in can either be a filename or a
directory
> > > name.
> > > > >> If
> > > > >> > you
> > > > >> > > > > pass in a filename, STAT-Analysis will read it
> *REGARDLESS*
> > of
> > > > the
> > > > >> > file
> > > > >> > > > > extension.  If you pass in a directory name, STAT-
Analysis
> > > will
> > > > >> > search
> > > > >> > > > that
> > > > >> > > > > directory *RECURSIVELY* for files ending in
".stat".  For
> > > > example,
> > > > >> > > either
> > > > >> > > > > of the following settings would tell STAT-Analysis
to read
> > the
> > > > >> same
> > > > >> > > list
> > > > >> > > > of
> > > > >> > > > > files:
> > > > >> > > > >    -lookin /GFS/data/hourly/*/*.stat
> > > > >> > > > >    ... or ...
> > > > >> > > > >    -lookin /GFS/data/hourly
> > > > >> > > > >
> > > > >> > > > > Be aware though that the more data you pass to
> > STAT-Analysis,
> > > > the
> > > > >> > > longer
> > > > >> > > > > it'll take for it to process it.  You can decide
how much
> > data
> > > > you
> > > > >> > pass
> > > > >> > > > it
> > > > >> > > > > for each job.  I'd suggest starting with what is
most
> > > convenient
> > > > >> for
> > > > >> > > you.
> > > > >> > > > > If it's too slow, change the logic to pass it less
data
> > (e.g.
> > > > >> only 1
> > > > >> > > day
> > > > >> > > > of
> > > > >> > > > > data rather than 1 month of data).
> > > > >> > > > >
> > > > >> > > > > Yes, you can give it a date range.  Use
-fcst_init_beg and
> > > > >> > > -fcst_init_end
> > > > >> > > > > to specify beginning/ending model initialization
times or
> > > > >> > > -fcst_valid_beg
> > > > >> > > > > and -fcst_valid_end to specify beginning/ending
valid
> times.
> > > > >> > > > >
> > > > >> > > > > If you find that you're running multiple jobs on
the same
> > > subset
> > > > >> of
> > > > >> > > data
> > > > >> > > > > (e.g. process MPR to CNT, MPR to SL1L2, MPR to CTC,
MPR to
> > > CTS),
> > > > >> it'd
> > > > >> > > be
> > > > >> > > > > more efficient to group those jobs into a config
file.
> > > That'll
> > > > do
> > > > >> > the
> > > > >> > > > > filtering ONCE and write the filtered data to a
temp file.
> > > Then
> > > > >> all
> > > > >> > > the
> > > > >> > > > > jobs read data from the temp instead of starting
over from
> > > > >> scratch.
> > > > >> > > > >
> > > > >> > > > > Make sense?
> > > > >> > > > >
> > > > >> > > > > John
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn MacCracken
- NOAA
> > > > >> Affiliate
> > > > >> > > via
> > > > >> > > > RT
> > > > >> > > > > <met_help at ucar.edu> wrote:
> > > > >> > > > >
> > > > >> > > > > >
> > > > >> > > > > > <URL: https://rt.rap.ucar.edu/rt/
> > > Ticket/Display.html?id=84822
> > > > >
> > > > >> > > > > >
> > > > >> > > > > > Hi John,
> > > > >> > > > > >
> > > > >> > > > > > That's actually only partially correct.  It's not
that I
> > > want
> > > > to
> > > > >> > use
> > > > >> > > > part
> > > > >> > > > > > of the MPR lines and discard the rest, and I do
need to
> > > > >> regenerate
> > > > >> > > > > > statistics.  Let me try to re-explain.
> > > > >> > > > > >
> > > > >> > > > > > Back in early March we switched from getting our
ASCAT
> obs
> > > > from
> > > > >> the
> > > > >> > > > > > prepbufr data, to getting it from the MGDRLITE
data. So,
> > > > >> processing
> > > > >> > > > > didn't
> > > > >> > > > > > change.  I was producing statistics at certain
threshold
> > > > levels
> > > > >> for
> > > > >> > > > both
> > > > >> > > > > > GFS and ASCAT.  I had this set with the
cat_thresh list,
> > at
> > > > >> levels
> > > > >> > of
> > > > >> > > > > > 0,6,17, etc.  We found out after processing for a
couple
> > of
> > > > >> weeks
> > > > >> > > that
> > > > >> > > > > the
> > > > >> > > > > > ASCAT data included these really small values,
<1.0 m/s,
> > and
> > > > >> that
> > > > >> > > these
> > > > >> > > > > > small wind speeds were being included into the
> statistics
> > > > >> > processing.
> > > > >> > > > > >
> > > > >> > > > > > So, a couple of questions.
> > > > >> > > > > > 1) Do I have to regenerate all of my statistics
(*.cts,
> > > *.cnt
> > > > >> and
> > > > >> > > *ctc
> > > > >> > > > > > files) because of this error? Or, since I have
threshold
> > > > levels
> > > > >> > set,
> > > > >> > > > will
> > > > >> > > > > > those small values be amoung the statistics in
the
> lowest
> > > > >> > thresholds?
> > > > >> > > > > > 2) I have the *.stat files, but, they are spread
out
> into
> > > > >> separate
> > > > >> > > > > > directories like:
> > > > >> > > > > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> > > > >> > > > > > Can I tell stat-analysis to "lookin" directories
with a
> > > > wildcard
> > > > >> > > (like
> > > > >> > > > > > 201803*)?  If so, how?  Or, is I tell it to look
in
> > > > >> > /GFS/data/hourly,
> > > > >> > > > > will
> > > > >> > > > > > it look in all the directories recursively under
hourly?
> > > And,
> > > > >> it
> > > > >> > > > that's
> > > > >> > > > > > the case, can I give it a date range, so, that it
only
> > > > processes
> > > > >> > data
> > > > >> > > > > from
> > > > >> > > > > > March?
> > > > >> > > > > >
> > > > >> > > > > > Roz
> > > > >> > > > > >
> > > > >> > > > > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley
Gotway via
> > RT <
> > > > >> > > > > > met_help at ucar.edu> wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hi Roz,
> > > > >> > > > > > >
> > > > >> > > > > > > I read that you've run Point-Stat and saved off
the
> > > matched
> > > > >> pairs
> > > > >> > > > (MPR)
> > > > >> > > > > > > output line type.  And you'd like to (1) filter
those
> > MPR
> > > > >> lines
> > > > >> > to
> > > > >> > > > > > discard
> > > > >> > > > > > > some of them and then (2) use the filtered data
to
> > > > regenerate
> > > > >> > > summary
> > > > >> > > > > > > statistics.  Yes, this is easily done using the
> > > > STAT-Analysis
> > > > >> > tool
> > > > >> > > in
> > > > >> > > > > > MET.
> > > > >> > > > > > >
> > > > >> > > > > > > You wrote that you're verifying wind speeds
against
> > ASCAT
> > > > and
> > > > >> > that
> > > > >> > > > > you'd
> > > > >> > > > > > > like to exclude pairs where the observed wind
speed is
> > > less
> > > > >> than
> > > > >> > 1
> > > > >> > > > m/s.
> > > > >> > > > > > > I'm just guessing here, but I'll presume that
you want
> > to
> > > > >> produce
> > > > >> > > > both
> > > > >> > > > > > > SL1L2 and CNT output line types.  Here's what
the
> > > > >> STAT-Analysis
> > > > >> > job
> > > > >> > > > > would
> > > > >> > > > > > > look like:
> > > > >> > > > > > >
> > > > >> > > > > > > # Filter MPR's and write SL1L2 output line
> > > > >> > > > > > > stat_analysis \
> > > > >> > > > > > >    -lookin input.stat \            # List a
.stat
> > filename
> > > > or
> > > > >> > > > directory
> > > > >> > > > > > > containing them
> > > > >> > > > > > >    -job aggregate_stat \        # Job type is
> > > aggregate_stat
> > > > >> > > > > > >    -line_type MPR \              # Input line
type =
> MPR
> > > > >> > > > > > >    -out_line_type SL1L2 \      # Output line
type =
> > SL1L2
> > > > >> partial
> > > > >> > > > sums
> > > > >> > > > > > >    -fcst_var WIND \               # Only
process lines
> > > where
> > > > >> > > FCST_VAR
> > > > >> > > > > > > column = WIND
> > > > >> > > > > > >    -column_thresh OBS gt1 \ # Only use MPR
lines where
> > OBS
> > > > >> column
> > > > >> > > > 1
> > > > >> > > > > > >    -by
> > > > >> > > > > > > MODEL,FCST_LEV,FCST_INIT_BEG,
> > > FCST_LEAD,VX_MASK,INTERP_MTHD,
> > > > >> > > > INTERP_PNTS
> > > > >> > > > > #
> > > > >> > > > > > > Run this same job for each unique combination
of these
> > > > columns
> > > > >> > > > > > >    -out_stat MPR_to_SL1L2.stat
> > > > >> > > > > > >
> > > > >> > > > > > > This will read produce an output .stat file
containing
> > an
> > > > >> SL1L2
> > > > >> > > line
> > > > >> > > > > for
> > > > >> > > > > > > each unique combination of the header columns
listed
> > after
> > > > the
> > > > >> > > "-by"
> > > > >> > > > > > > option.  To generate CNT output lines instead,
you'd
> > run a
> > > > >> second
> > > > >> > > job
> > > > >> > > > > > where
> > > > >> > > > > > > you replace SL1L2 with CNT.  You could run
these jobs
> on
> > > the
> > > > >> > > command
> > > > >> > > > > line
> > > > >> > > > > > > or group them together into a STAT-Analysis
config
> file,
> > > if
> > > > >> you
> > > > >> > > > prefer.
> > > > >> > > > > > > Both would work.
> > > > >> > > > > > >
> > > > >> > > > > > > You could run this once for each input .stat
file
> you're
> > > > >> > > > processing...
> > > > >> > > > > or
> > > > >> > > > > > > you could pass many input .stat files to the
job.
> Since
> > > > >> > > > FCST_INIT_BEG
> > > > >> > > > > > and
> > > > >> > > > > > > FCST_LEAD are included in the "-by" option,
you'll get
> > > > >> separate
> > > > >> > > > output
> > > > >> > > > > > > lines for each unique time.
> > > > >> > > > > > >
> > > > >> > > > > > > Hope that helps get you going.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > > John
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie
Prestopnik via
> > RT <
> > > > >> > > > > > > met_help at ucar.edu>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > <URL: https://rt.rap.ucar.edu/rt/Tic
> > > > >> ket/Display.html?id=84822
> > > > >> > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > Hi Roz.  My apologies for the delay in
responding.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Unfortunately, John is out of the office this
week,
> > and
> > > I
> > > > do
> > > > >> > not
> > > > >> > > > know
> > > > >> > > > > > the
> > > > >> > > > > > > > answers to your questions.  As you said, I
would
> also
> > > > >> imagine
> > > > >> > > that
> > > > >> > > > > > > > point-stat is using those small values as
matched
> > pairs.
> > > > >> > Also, I
> > > > >> > > > do
> > > > >> > > > > > not
> > > > >> > > > > > > > believe there is a way to regenerate the
point-stat
> > > > >> statistics
> > > > >> > > > > without
> > > > >> > > > > > > > using the original GFS data.  I cannot say
with
> > > certainty,
> > > > >> > > however.
> > > > >> > > > > > > Thank
> > > > >> > > > > > > > you for your patience in advance.  We'll get
a
> > definite
> > > > >> > response
> > > > >> > > to
> > > > >> > > > > you
> > > > >> > > > > > > as
> > > > >> > > > > > > > soon as we can.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thanks,
> > > > >> > > > > > > > Julie
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn
MacCracken
> -
> > > NOAA
> > > > >> > > > Affiliate
> > > > >> > > > > > via
> > > > >> > > > > > > RT
> > > > >> > > > > > > > <met_help at ucar.edu> wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Wed Apr 18 06:31:39 2018: Request 84822 was
acted
> > > upon.
> > > > >> > > > > > > > > Transaction: Ticket created by
> > > > >> rosalyn.maccracken at noaa.gov
> > > > >> > > > > > > > >        Queue: met_help
> > > > >> > > > > > > > >      Subject: question on regenerating data
> > > > >> > > > > > > > >        Owner: Nobody
> > > > >> > > > > > > > >   Requestors: rosalyn.maccracken at noaa.gov
> > > > >> > > > > > > > >       Status: new
> > > > >> > > > > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > > >> > > > > > Ticket/Display.html?id=84822
> > > > >> > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Hi,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I'm running point-stat using ASCAT and GFS
data to
> > > > verify
> > > > >> > > surface
> > > > >> > > > > > wind
> > > > >> > > > > > > > > speeds.  I found an error in my ASCAT input
data
> > that
> > > > goes
> > > > >> > back
> > > > >> > > > to
> > > > >> > > > > > Mar
> > > > >> > > > > > > 7.
> > > > >> > > > > > > > > I had switched the input source of the
data, and
> > > within
> > > > >> the
> > > > >> > new
> > > > >> > > > > data
> > > > >> > > > > > > > files,
> > > > >> > > > > > > > > it was allowing very small values (< 1 m/s)
to be
> > used
> > > > as
> > > > >> > data
> > > > >> > > > > points
> > > > >> > > > > > > in
> > > > >> > > > > > > > > the verification.  I imagine that this is
an
> issue,
> > > > since
> > > > >> > > > > point-stat
> > > > >> > > > > > is
> > > > >> > > > > > > > > using these very small values as matched
pairs
> with
> > > the
> > > > >> GFS,
> > > > >> > > > > correct?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Is there a way to regenerate the point-stat
> > statistics
> > > > >> > without
> > > > >> > > > > using
> > > > >> > > > > > > the
> > > > >> > > > > > > > > original GFS data?  I do have the *stat and
the
> *mpr
> > > > >> files,
> > > > >> > and
> > > > >> > > > it
> > > > >> > > > > is
> > > > >> > > > > > > > > pretty easy to identify where the bad
values are
> > > > located.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thanks,
> > > > >> > > > > > > > > Roz
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > --
> > > > >> > > > > > > > > Rosalyn MacCracken
> > > > >> > > > > > > > > Support Scientist
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Ocean Applications Branch
> > > > >> > > > > > > > > NOAA/NWS Ocean Prediction Center
> > > > >> > > > > > > > > NCWCP
> > > > >> > > > > > > > > 5830 University Research Ct
> > > > >> > > > > > > > > College Park, MD  20740-3818
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > (p) 301-683-1551
> > > > >> > > > > > > > > rosalyn.maccracken at noaa.gov
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > --
> > > > >> > > > > > Rosalyn MacCracken
> > > > >> > > > > > Support Scientist
> > > > >> > > > > >
> > > > >> > > > > > Ocean Applications Branch
> > > > >> > > > > > NOAA/NWS Ocean Prediction Center
> > > > >> > > > > > NCWCP
> > > > >> > > > > > 5830 University Research Ct
> > > > >> > > > > > College Park, MD  20740-3818
> > > > >> > > > > >
> > > > >> > > > > > (p) 301-683-1551
> > > > >> > > > > > rosalyn.maccracken at noaa.gov
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > Rosalyn MacCracken
> > > > >> > > > Support Scientist
> > > > >> > > >
> > > > >> > > > Ocean Applications Branch
> > > > >> > > > NOAA/NWS Ocean Prediction Center
> > > > >> > > > NCWCP
> > > > >> > > > 5830 University Research Ct
> > > > >> > > > College Park, MD  20740-3818
> > > > >> > > >
> > > > >> > > > (p) 301-683-1551
> > > > >> > > > rosalyn.maccracken at noaa.gov
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Rosalyn MacCracken
> > > > >> > Support Scientist
> > > > >> >
> > > > >> > Ocean Applications Branch
> > > > >> > NOAA/NWS Ocean Prediction Center
> > > > >> > NCWCP
> > > > >> > 5830 University Research Ct
> > > > >> > College Park, MD  20740-3818
> > > > >> >
> > > > >> > (p) 301-683-1551
> > > > >> > rosalyn.maccracken at noaa.gov
> > > > >> >
> > > > >> >
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Rosalyn MacCracken
> > > > > Support Scientist
> > > > >
> > > > > Ocean Applications Branch
> > > > > NOAA/NWS Ocean Prediction Center
> > > > > NCWCP
> > > > > 5830 University Research Ct
> > > > > College Park, MD  20740-3818
> > > > >
> > > > > (p) 301-683-1551
> > > > > rosalyn.maccracken at noaa.gov
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Rosalyn MacCracken
> > > > Support Scientist
> > > >
> > > > Ocean Applications Branch
> > > > NOAA/NWS Ocean Prediction Center
> > > > NCWCP
> > > > 5830 University Research Ct
> > > > College Park, MD  20740-3818
> > > >
> > > > (p) 301-683-1551
> > > > rosalyn.maccracken at noaa.gov
> > > >
> > > >
> > >
> > >
> >
> >
> > --
> > Rosalyn MacCracken
> > Support Scientist
> >
> > Ocean Applications Branch
> > NOAA/NWS Ocean Prediction Center
> > NCWCP
> > 5830 University Research Ct
> > College Park, MD  20740-3818
> >
> > (p) 301-683-1551
> > rosalyn.maccracken at noaa.gov
> >
> >
>
>


--
Rosalyn MacCracken
Support Scientist

Ocean Applications Branch
NOAA/NWS Ocean Prediction Center
NCWCP
5830 University Research Ct
College Park, MD  20740-3818

(p) 301-683-1551
rosalyn.maccracken at noaa.gov

------------------------------------------------
Subject: question on regenerating data
From: John Halley Gotway
Time: Thu Apr 26 14:14:36 2018

Roz,

The CSI statistics is computed from a 2x2 contingency table.  A 2x2
contingency table is defined by a single threshold.  Looking in the
.stat
files you sent, I see that you've applied many thresholds to generate
many
2x2 contingency tables and corresponding statistics.  Yes, it is true
that
for most of those thresholds, the "bad" observation values will fall
into
the "non-event" category.  But those non-event counts are included in
the
computation of some stats, including CSI.  So even through the bad
observations aren't very interesting, they really are impacting the
statistics.

John

On Wed, Apr 25, 2018 at 10:08 AM, Rosalyn MacCracken - NOAA Affiliate
via
RT <met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
>
> Figures.  I just calculated how long it will take me to regenerate
data for
> 03072018 - 04122018.  It will take me 912 hours.  ;-(
>
> Ok, I know I asked this, but, if I had a OBS value of 0.01 and a
matched
> GFS point of 10 m/s, and I had a low threshold of 0-5 m/s, 6-10 m/s
and
> 10-15 m/s, and say, CSI was calculated.  Which threshold would be
used for
> the output, the 0-5 or 6-10?  And, would the 10-15 threshold even be
> effected?
>
> Roz
>
> On Wed, Apr 25, 2018 at 11:40 AM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
> > Roz,
> >
> > I think it'd take just as long.  The slow part is reading the
data... not
> > applying a threshold.
> >
> > John
> >
> > On Wed, Apr 25, 2018 at 9:18 AM, Rosalyn MacCracken - NOAA
Affiliate via
> RT
> > <met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > >
> > > Hi John,
> > >
> > > Thanks for doing that for me.  I'll take a look at the info you
sent me
> > > this afternoon.  I'm in the middle of doing something right
> now...trying
> > to
> > > make a different program work.  ;-/
> > >
> > > I wonder if it will be quicker than 18 minutes for some of the
> thresholds
> > > that have higher wind speeds, and not as many instances (or 0
> instances).
> > > Or, will it take just as long, since it still needs to read
through the
> > > entire *.stat file anyway?
> > >
> > > Roz
> > >
> > > On Tue, Apr 24, 2018 at 7:06 PM, John Halley Gotway via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > > Hi Roz,
> > > >
> > > > Thanks for sending the sample data.  I grabbed it and used it
run
> some
> > > > sample jobs:
> > > >
> > > > time /d1/johnhg/MET/MET_releases/met-6.0/bin/stat_analysis \
> > > > -lookin
> > > > /d1/johnhg/MET/MET_Help/maccracken_data_20180424/opc_
> > > > test/home/opc_test/data/met_verif/GFS/data/hourly
> > > > \
> > > > -config STATAnalysisConfig \
> > > > -log run_sa.log -v 3
> > > >
> > > > I used the "-lookin" option to point to all the data you sent.
> > > >
> > > > I've attached the...
> > > > (1) config file I used
> > > > (2) log file that was genrated
> > > > (3) output .stat files
> > > >
> > > > Looking at the jobs, you'll see that I've included 5 of
them...
> > > > - Generate CNT output
> > > > - Generate CTC >= 0.0 output
> > > > - Generate CTS >= 0.0 output
> > > > - Generate CTC >= 5.5689 output
> > > > - Generate CTS >= 5.5689 output
> > > >
> > > > Unfortunately, you'll need to define separate jobs for each
threshold
> > > you'd
> > > > like to use.  Although, you shouldn't use >=0.0 since that's
always
> > true.
> > > >
> > > > Also unfortunately, this is pretty slow.  On my machine, it
took like
> > 18
> > > > minutes for these 5 jobs!
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > >
> > > > On Tue, Apr 24, 2018 at 2:09 PM, Rosalyn MacCracken - NOAA
Affiliate
> > via
> > > RT
> > > > <met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822 >
> > > > >
> > > > > Hi John,
> > > > >
> > > > > I put my file on the ftp site.  Let me know what you find.
You'll
> > see
> > > > > those really low OBS values (0.01, 0.02, and so on).
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Roz
> > > > >
> > > > > On Tue, Apr 24, 2018 at 2:53 PM, Rosalyn MacCracken - NOAA
> Affiliate
> > <
> > > > > rosalyn.maccracken at noaa.gov> wrote:
> > > > >
> > > > > > Ok, I'll get that over to the ftp site.  I have to make
sure
> that I
> > > > find
> > > > > a
> > > > > > day that has all the data in it.  Sometimes the data isn't
> > available
> > > > when
> > > > > > the script runs.  A little annoying, but, that's
operations...
> > > > > >
> > > > > > I'll let you know when I get the file to the ftp site.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Roz
> > > > > >
> > > > > > On Tue, Apr 24, 2018 at 2:49 PM, John Halley Gotway via RT
<
> > > > > > met_help at ucar.edu> wrote:
> > > > > >
> > > > > >> Roz,
> > > > > >>
> > > > > >> Yes, we do.  Follow the instructions here:
> > > > > >>
https://dtcenter.org/met/users/support/met_help.php#ftp
> > > > > >>
> > > > > >> I'd suggest making a tar file for one day and posting
them to
> the
> > > ftp
> > > > > >> site:
> > > > > >>    tar -cvzf sample.tar.gz /GFS/data/hourly/20180305*
> > > > > >>
> > > > > >> Thanks,
> > > > > >> John
> > > > > >>
> > > > > >> On Tue, Apr 24, 2018 at 11:57 AM, Rosalyn MacCracken -
NOAA
> > > Affiliate
> > > > > via
> > > > > >> RT <met_help at ucar.edu> wrote:
> > > > > >>
> > > > > >> >
> > > > > >> > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=84822
> >
> > > > > >> >
> > > > > >> > HI John,
> > > > > >> >
> > > > > >> > Yes, it does seem that the -config option is the way to
go to
> > > > recreate
> > > > > >> > those 3 files. I'll be sure to have a unique file name,
or, mv
> > the
> > > > > >> output
> > > > > >> > file to a different name before running the command
again.
> > Thanks
> > > > for
> > > > > >> > pointing that out.
> > > > > >> >
> > > > > >> > I'm teleworking for the next couple of weeks, so,
download and
> > > send
> > > > > you
> > > > > >> > *.stat files like I can when I'm at my computer at
work.  I
> > don't
> > > > have
> > > > > >> > access to theia or wcoss anymore.  You have an ftp
server
> that I
> > > can
> > > > > >> upload
> > > > > >> > data to, right?  If not, I can try and fiddle around
with this
> > > > > tomorrow
> > > > > >> and
> > > > > >> > see if I can't get this to work the way I want to.
> > > > > >> >
> > > > > >> > Roz
> > > > > >> >
> > > > > >> > On Tue, Apr 24, 2018 at 11:42 AM, John Halley Gotway
via RT <
> > > > > >> > met_help at ucar.edu> wrote:
> > > > > >> >
> > > > > >> > > Roz,
> > > > > >> > >
> > > > > >> > > Each "-job aggregate_stat" only generates a single
output
> line
> > > > type.
> > > > > >> So
> > > > > >> > > using "-out_line_type CTC,CTS,CNT" will not work.
> > > > > >> > >
> > > > > >> > > You'll need to run separate jobs for each output line
type
> you
> > > > want
> > > > > to
> > > > > >> > > generate.  That's why I'd recommend grouping those
multiple
> > jobs
> > > > > >> together
> > > > > >> > > into a single STAT-Analysis config file.  Then you'd
call
> > > > > >> STAT-Analysis
> > > > > >> > > once using the "-config" command line option.
> > > > > >> > >
> > > > > >> > > Another issue is that if you set "-out_stat" to the
same
> > > filename,
> > > > > >> it'll
> > > > > >> > > get overridden by each job.  STAT-Analysis will
overwrite
> that
> > > > > output
> > > > > >> > file
> > > > > >> > > rather than appending to it.
> > > > > >> > >
> > > > > >> > > You could send me a day's worth of .stat output files
> > > > > >> > > (/GFS/data/hourly/20180305*) and I could send you
some
> > > > suggestions.
> > > > > >> Or
> > > > > >> > if
> > > > > >> > > you have access to theia you could copy them up there
and
> > point
> > > me
> > > > > to
> > > > > >> it.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > John
> > > > > >> > >
> > > > > >> > > On Tue, Apr 24, 2018 at 7:48 AM, Rosalyn MacCracken -
NOAA
> > > > Affiliate
> > > > > >> via
> > > > > >> > RT
> > > > > >> > > <met_help at ucar.edu> wrote:
> > > > > >> > >
> > > > > >> > > >
> > > > > >> > > > <URL: https://rt.rap.ucar.edu/rt/
> > Ticket/Display.html?id=84822
> > > >
> > > > > >> > > >
> > > > > >> > > > Hi John,
> > > > > >> > > >
> > > > > >> > > > Yes, that makes sense.  Those very small values
(<1.0
> m/s),
> > > are
> > > > > bad
> > > > > >> > > > values.  That's why they shouldn't be included in
the
> > > > processing.
> > > > > >> > > >
> > > > > >> > > > So, I need to just regenerate hourly data, one hour
at a
> > time.
> > > > > >> Would
> > > > > >> > it
> > > > > >> > > > make sense to use a shell script and loop stat-
analysis?
> > > > > Something
> > > > > >> > like:
> > > > > >> > > >
> > > > > >> > > > for day in 11 12
> > > > > >> > > > do
> > > > > >> > > >   for cycle in 00 06 12 18
> > > > > >> > > >   do
> > > > > >> > > > stat_analysis -lookin
/GFS/data/hourly/201803${day}$
> > > > {hour}/*.stat
> > > > > \
> > > > > >> > > > -job aggregate_stat \
> > > > > >> > > >    -line_type MPR \
> > > > > >> > > >    -out_line_type CTC,CTS,CNT \
> > > > > >> > > >   -fcst_var WIND \
> > > > > >> > > > -column_thresh OBS gt1 \
> > > > > >> > > >  -by
> > > > > >> > > > MODEL,FCST_LEV,FCST_INIT_BEG,
> FCST_LEAD,VX_MASK,INTERP_MTHD,
> > I
> > > > > >> NTERP_PNTS
> > > > > >> > > > -out_stat
/new_rerun_stat_files/MPR_to_CTC_CTS_CNT.stat
> > > > > >> > > >   done
> > > > > >> > > > done
> > > > > >> > > >
> > > > > >> > > > or, something like that?  And, will this regenerate
hour
> > > > > forecasts,
> > > > > >> at
> > > > > >> > > each
> > > > > >> > > > forecast and lead hour?  I guess it will see the
forecast
> > and
> > > > lead
> > > > > >> hour
> > > > > >> > > > from the *.stat file, and whatever *stat file is in
the
> > > > directory,
> > > > > >> it
> > > > > >> > > will
> > > > > >> > > > regenerate those hours, right?
> > > > > >> > > >
> > > > > >> > > > So, I need to regenerate the CTC, CNT and CTS
files.
> That's
> > > > why I
> > > > > >> did:
> > > > > >> > > >  -out_line_type CTC,CTS,CNT
> > > > > >> > > > but, will that make 3 separate files, or just
another
> *.stat
> > > > file?
> > > > > >> > > >
> > > > > >> > > > Roz
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Mon, Apr 23, 2018 at 4:01 PM, John Halley Gotway
via
> RT <
> > > > > >> > > > met_help at ucar.edu> wrote:
> > > > > >> > > >
> > > > > >> > > > > Roz,
> > > > > >> > > > >
> > > > > >> > > > > It is ultimately up to you to decide which
matched pairs
> > you
> > > > > want
> > > > > >> to
> > > > > >> > > > > include in your processing.  Do you consider
those small
> > > (<1.0
> > > > > >> m/s)
> > > > > >> > > > > observation values to be corrupt and incorrect in
some
> way
> > > or
> > > > > just
> > > > > >> > not
> > > > > >> > > > very
> > > > > >> > > > > interesting?  If they really are BAD data values,
I
> agree
> > > that
> > > > > you
> > > > > >> > > should
> > > > > >> > > > > exclude them from your analysis.  But if they're
just
> > > > > >> uninteresting
> > > > > >> > > > values
> > > > > >> > > > > of low wind speed, then there's no reason why you
should
> > > > exclude
> > > > > >> > them.
> > > > > >> > > > For
> > > > > >> > > > > example, *most* of the time it ins't raining, but
we
> often
> > > > > >> included
> > > > > >> > > > > observations of 0 precip.
> > > > > >> > > > >
> > > > > >> > > > > There are three configurable options in Point-
Stat that
> > may
> > > be
> > > > > >> useful
> > > > > >> > > > here:
> > > > > >> > > > > (1) You already know and use the "cat_thresh"
option.
> > This
> > > > > >> threshold
> > > > > >> > > > > defines the events and non-events for a 2x2
contingency
> > > table.
> > > > > >> This
> > > > > >> > > > > threshold affects the contents of FHO, CTC, CTS,
MCTC,
> and
> > > > MCTS
> > > > > >> line
> > > > > >> > > > types
> > > > > >> > > > > that Point-Stat writes.
> > > > > >> > > > > (2) The "cnt_thresh" option is a more recent
addition.
> > > > Perhaps
> > > > > >> this
> > > > > >> > > was
> > > > > >> > > > a
> > > > > >> > > > > poor name choice, but instead of defining
categories,
> it's
> > > > > really
> > > > > >> a
> > > > > >> > > > > *filtering* threshold.  This threshold affects
the
> > contents
> > > of
> > > > > the
> > > > > >> > > SL1L2,
> > > > > >> > > > > SAL1L2, and CNT line types that Point-Stat
writes.  For
> > > > example,
> > > > > >> > > setting
> > > > > >> > > > > "cnt_thresh = [ ge6, ge17 ];" will produce 2 CNT
and 2
> > SL1L2
> > > > > >> output
> > > > > >> > > lines
> > > > > >> > > > > containing only those points where the wind speed
was
> >=6
> > > and
> > > > > >> >=17,
> > > > > >> > > > > respectively.
> > > > > >> > > > > (3) The "wind_thresh" option is very similar to
the
> > > > "cnt_thresh"
> > > > > >> > option
> > > > > >> > > > but
> > > > > >> > > > > affects the contents of teh VL1L2, VAL1L2, and
VCNT (new
> > in
> > > > > >> met-7.0)
> > > > > >> > > line
> > > > > >> > > > > types.  Only those U/V pairs that meet the
specified
> wind
> > > > speed
> > > > > >> > > threshold
> > > > > >> > > > > are included in the output.
> > > > > >> > > > >
> > > > > >> > > > > For both "cnt_thresh" and "wind_thresh", the
default
> value
> > > in
> > > > > the
> > > > > >> > > config
> > > > > >> > > > > file is "NA", meaning, do not apply any filtering
> > threshold
> > > > > >> criteria.
> > > > > >> > > > >
> > > > > >> > > > > You have the flexibility to run STAT-Analysis on
the MPR
> > > > output
> > > > > >> lines
> > > > > >> > > to
> > > > > >> > > > > recompute any of these output line types applying
> whatever
> > > > > >> filtering
> > > > > >> > > > > criteria you'd like.
> > > > > >> > > > > Here's the MET user's guide:
> > > > > >> > > > >
https://dtcenter.org/met/users/docs/users_guide/MET_
> > > > > >> > > Users_Guide_v7.0.pdf
> > > > > >> > > > > Look on page 98 for the job command options for
the
> > > > > >> "aggregate_stat"
> > > > > >> > > line
> > > > > >> > > > > type when the input line type is "MPR".
> > > > > >> > > > >
> > > > > >> > > > > For your second question, the "-lookin PATH"
option is
> > > *VERY*
> > > > > >> > flexible.
> > > > > >> > > > > You can set PATH to either a single value or
multiple
> > > values.
> > > > > If
> > > > > >> you
> > > > > >> > > use
> > > > > >> > > > > wildcards, then the shell expands those wildcards
to
> > > multiple
> > > > > >> values.
> > > > > >> > > > Each
> > > > > >> > > > > value you pass in can either be a filename or a
> directory
> > > > name.
> > > > > >> If
> > > > > >> > you
> > > > > >> > > > > pass in a filename, STAT-Analysis will read it
> > *REGARDLESS*
> > > of
> > > > > the
> > > > > >> > file
> > > > > >> > > > > extension.  If you pass in a directory name,
> STAT-Analysis
> > > > will
> > > > > >> > search
> > > > > >> > > > that
> > > > > >> > > > > directory *RECURSIVELY* for files ending in
".stat".
> For
> > > > > example,
> > > > > >> > > either
> > > > > >> > > > > of the following settings would tell STAT-
Analysis to
> read
> > > the
> > > > > >> same
> > > > > >> > > list
> > > > > >> > > > of
> > > > > >> > > > > files:
> > > > > >> > > > >    -lookin /GFS/data/hourly/*/*.stat
> > > > > >> > > > >    ... or ...
> > > > > >> > > > >    -lookin /GFS/data/hourly
> > > > > >> > > > >
> > > > > >> > > > > Be aware though that the more data you pass to
> > > STAT-Analysis,
> > > > > the
> > > > > >> > > longer
> > > > > >> > > > > it'll take for it to process it.  You can decide
how
> much
> > > data
> > > > > you
> > > > > >> > pass
> > > > > >> > > > it
> > > > > >> > > > > for each job.  I'd suggest starting with what is
most
> > > > convenient
> > > > > >> for
> > > > > >> > > you.
> > > > > >> > > > > If it's too slow, change the logic to pass it
less data
> > > (e.g.
> > > > > >> only 1
> > > > > >> > > day
> > > > > >> > > > of
> > > > > >> > > > > data rather than 1 month of data).
> > > > > >> > > > >
> > > > > >> > > > > Yes, you can give it a date range.  Use
-fcst_init_beg
> and
> > > > > >> > > -fcst_init_end
> > > > > >> > > > > to specify beginning/ending model initialization
times
> or
> > > > > >> > > -fcst_valid_beg
> > > > > >> > > > > and -fcst_valid_end to specify beginning/ending
valid
> > times.
> > > > > >> > > > >
> > > > > >> > > > > If you find that you're running multiple jobs on
the
> same
> > > > subset
> > > > > >> of
> > > > > >> > > data
> > > > > >> > > > > (e.g. process MPR to CNT, MPR to SL1L2, MPR to
CTC, MPR
> to
> > > > CTS),
> > > > > >> it'd
> > > > > >> > > be
> > > > > >> > > > > more efficient to group those jobs into a config
file.
> > > > That'll
> > > > > do
> > > > > >> > the
> > > > > >> > > > > filtering ONCE and write the filtered data to a
temp
> file.
> > > > Then
> > > > > >> all
> > > > > >> > > the
> > > > > >> > > > > jobs read data from the temp instead of starting
over
> from
> > > > > >> scratch.
> > > > > >> > > > >
> > > > > >> > > > > Make sense?
> > > > > >> > > > >
> > > > > >> > > > > John
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Apr 23, 2018 at 1:01 PM, Rosalyn
MacCracken -
> NOAA
> > > > > >> Affiliate
> > > > > >> > > via
> > > > > >> > > > RT
> > > > > >> > > > > <met_help at ucar.edu> wrote:
> > > > > >> > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > <URL: https://rt.rap.ucar.edu/rt/
> > > > Ticket/Display.html?id=84822
> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > Hi John,
> > > > > >> > > > > >
> > > > > >> > > > > > That's actually only partially correct.  It's
not
> that I
> > > > want
> > > > > to
> > > > > >> > use
> > > > > >> > > > part
> > > > > >> > > > > > of the MPR lines and discard the rest, and I do
need
> to
> > > > > >> regenerate
> > > > > >> > > > > > statistics.  Let me try to re-explain.
> > > > > >> > > > > >
> > > > > >> > > > > > Back in early March we switched from getting
our ASCAT
> > obs
> > > > > from
> > > > > >> the
> > > > > >> > > > > > prepbufr data, to getting it from the MGDRLITE
data.
> So,
> > > > > >> processing
> > > > > >> > > > > didn't
> > > > > >> > > > > > change.  I was producing statistics at certain
> threshold
> > > > > levels
> > > > > >> for
> > > > > >> > > > both
> > > > > >> > > > > > GFS and ASCAT.  I had this set with the
cat_thresh
> list,
> > > at
> > > > > >> levels
> > > > > >> > of
> > > > > >> > > > > > 0,6,17, etc.  We found out after processing for
a
> couple
> > > of
> > > > > >> weeks
> > > > > >> > > that
> > > > > >> > > > > the
> > > > > >> > > > > > ASCAT data included these really small values,
<1.0
> m/s,
> > > and
> > > > > >> that
> > > > > >> > > these
> > > > > >> > > > > > small wind speeds were being included into the
> > statistics
> > > > > >> > processing.
> > > > > >> > > > > >
> > > > > >> > > > > > So, a couple of questions.
> > > > > >> > > > > > 1) Do I have to regenerate all of my statistics
> (*.cts,
> > > > *.cnt
> > > > > >> and
> > > > > >> > > *ctc
> > > > > >> > > > > > files) because of this error? Or, since I have
> threshold
> > > > > levels
> > > > > >> > set,
> > > > > >> > > > will
> > > > > >> > > > > > those small values be amoung the statistics in
the
> > lowest
> > > > > >> > thresholds?
> > > > > >> > > > > > 2) I have the *.stat files, but, they are
spread out
> > into
> > > > > >> separate
> > > > > >> > > > > > directories like:
> > > > > >> > > > > > /GFS/data/hourly/${YYYYMMDDHH}/*.stat
> > > > > >> > > > > > Can I tell stat-analysis to "lookin"
directories with
> a
> > > > > wildcard
> > > > > >> > > (like
> > > > > >> > > > > > 201803*)?  If so, how?  Or, is I tell it to
look in
> > > > > >> > /GFS/data/hourly,
> > > > > >> > > > > will
> > > > > >> > > > > > it look in all the directories recursively
under
> hourly?
> > > > And,
> > > > > >> it
> > > > > >> > > > that's
> > > > > >> > > > > > the case, can I give it a date range, so, that
it only
> > > > > processes
> > > > > >> > data
> > > > > >> > > > > from
> > > > > >> > > > > > March?
> > > > > >> > > > > >
> > > > > >> > > > > > Roz
> > > > > >> > > > > >
> > > > > >> > > > > > On Mon, Apr 23, 2018 at 2:18 PM, John Halley
Gotway
> via
> > > RT <
> > > > > >> > > > > > met_help at ucar.edu> wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Hi Roz,
> > > > > >> > > > > > >
> > > > > >> > > > > > > I read that you've run Point-Stat and saved
off the
> > > > matched
> > > > > >> pairs
> > > > > >> > > > (MPR)
> > > > > >> > > > > > > output line type.  And you'd like to (1)
filter
> those
> > > MPR
> > > > > >> lines
> > > > > >> > to
> > > > > >> > > > > > discard
> > > > > >> > > > > > > some of them and then (2) use the filtered
data to
> > > > > regenerate
> > > > > >> > > summary
> > > > > >> > > > > > > statistics.  Yes, this is easily done using
the
> > > > > STAT-Analysis
> > > > > >> > tool
> > > > > >> > > in
> > > > > >> > > > > > MET.
> > > > > >> > > > > > >
> > > > > >> > > > > > > You wrote that you're verifying wind speeds
against
> > > ASCAT
> > > > > and
> > > > > >> > that
> > > > > >> > > > > you'd
> > > > > >> > > > > > > like to exclude pairs where the observed wind
speed
> is
> > > > less
> > > > > >> than
> > > > > >> > 1
> > > > > >> > > > m/s.
> > > > > >> > > > > > > I'm just guessing here, but I'll presume that
you
> want
> > > to
> > > > > >> produce
> > > > > >> > > > both
> > > > > >> > > > > > > SL1L2 and CNT output line types.  Here's what
the
> > > > > >> STAT-Analysis
> > > > > >> > job
> > > > > >> > > > > would
> > > > > >> > > > > > > look like:
> > > > > >> > > > > > >
> > > > > >> > > > > > > # Filter MPR's and write SL1L2 output line
> > > > > >> > > > > > > stat_analysis \
> > > > > >> > > > > > >    -lookin input.stat \            # List a
.stat
> > > filename
> > > > > or
> > > > > >> > > > directory
> > > > > >> > > > > > > containing them
> > > > > >> > > > > > >    -job aggregate_stat \        # Job type is
> > > > aggregate_stat
> > > > > >> > > > > > >    -line_type MPR \              # Input line
type =
> > MPR
> > > > > >> > > > > > >    -out_line_type SL1L2 \      # Output line
type =
> > > SL1L2
> > > > > >> partial
> > > > > >> > > > sums
> > > > > >> > > > > > >    -fcst_var WIND \               # Only
process
> lines
> > > > where
> > > > > >> > > FCST_VAR
> > > > > >> > > > > > > column = WIND
> > > > > >> > > > > > >    -column_thresh OBS gt1 \ # Only use MPR
lines
> where
> > > OBS
> > > > > >> column
> > > > > >> > > > 1
> > > > > >> > > > > > >    -by
> > > > > >> > > > > > > MODEL,FCST_LEV,FCST_INIT_BEG,
> > > > FCST_LEAD,VX_MASK,INTERP_MTHD,
> > > > > >> > > > INTERP_PNTS
> > > > > >> > > > > #
> > > > > >> > > > > > > Run this same job for each unique combination
of
> these
> > > > > columns
> > > > > >> > > > > > >    -out_stat MPR_to_SL1L2.stat
> > > > > >> > > > > > >
> > > > > >> > > > > > > This will read produce an output .stat file
> containing
> > > an
> > > > > >> SL1L2
> > > > > >> > > line
> > > > > >> > > > > for
> > > > > >> > > > > > > each unique combination of the header columns
listed
> > > after
> > > > > the
> > > > > >> > > "-by"
> > > > > >> > > > > > > option.  To generate CNT output lines
instead, you'd
> > > run a
> > > > > >> second
> > > > > >> > > job
> > > > > >> > > > > > where
> > > > > >> > > > > > > you replace SL1L2 with CNT.  You could run
these
> jobs
> > on
> > > > the
> > > > > >> > > command
> > > > > >> > > > > line
> > > > > >> > > > > > > or group them together into a STAT-Analysis
config
> > file,
> > > > if
> > > > > >> you
> > > > > >> > > > prefer.
> > > > > >> > > > > > > Both would work.
> > > > > >> > > > > > >
> > > > > >> > > > > > > You could run this once for each input .stat
file
> > you're
> > > > > >> > > > processing...
> > > > > >> > > > > or
> > > > > >> > > > > > > you could pass many input .stat files to the
job.
> > Since
> > > > > >> > > > FCST_INIT_BEG
> > > > > >> > > > > > and
> > > > > >> > > > > > > FCST_LEAD are included in the "-by" option,
you'll
> get
> > > > > >> separate
> > > > > >> > > > output
> > > > > >> > > > > > > lines for each unique time.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Hope that helps get you going.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks,
> > > > > >> > > > > > > John
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Thu, Apr 19, 2018 at 9:23 AM, Julie
Prestopnik
> via
> > > RT <
> > > > > >> > > > > > > met_help at ucar.edu>
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > <URL: https://rt.rap.ucar.edu/rt/Tic
> > > > > >> ket/Display.html?id=84822
> > > > > >> > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Hi Roz.  My apologies for the delay in
responding.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Unfortunately, John is out of the office
this
> week,
> > > and
> > > > I
> > > > > do
> > > > > >> > not
> > > > > >> > > > know
> > > > > >> > > > > > the
> > > > > >> > > > > > > > answers to your questions.  As you said, I
would
> > also
> > > > > >> imagine
> > > > > >> > > that
> > > > > >> > > > > > > > point-stat is using those small values as
matched
> > > pairs.
> > > > > >> > Also, I
> > > > > >> > > > do
> > > > > >> > > > > > not
> > > > > >> > > > > > > > believe there is a way to regenerate the
> point-stat
> > > > > >> statistics
> > > > > >> > > > > without
> > > > > >> > > > > > > > using the original GFS data.  I cannot say
with
> > > > certainty,
> > > > > >> > > however.
> > > > > >> > > > > > > Thank
> > > > > >> > > > > > > > you for your patience in advance.  We'll
get a
> > > definite
> > > > > >> > response
> > > > > >> > > to
> > > > > >> > > > > you
> > > > > >> > > > > > > as
> > > > > >> > > > > > > > soon as we can.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > > > > Julie
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Wed, Apr 18, 2018 at 6:31 AM, Rosalyn
> MacCracken
> > -
> > > > NOAA
> > > > > >> > > > Affiliate
> > > > > >> > > > > > via
> > > > > >> > > > > > > RT
> > > > > >> > > > > > > > <met_help at ucar.edu> wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Wed Apr 18 06:31:39 2018: Request 84822
was
> acted
> > > > upon.
> > > > > >> > > > > > > > > Transaction: Ticket created by
> > > > > >> rosalyn.maccracken at noaa.gov
> > > > > >> > > > > > > > >        Queue: met_help
> > > > > >> > > > > > > > >      Subject: question on regenerating
data
> > > > > >> > > > > > > > >        Owner: Nobody
> > > > > >> > > > > > > > >   Requestors: rosalyn.maccracken at noaa.gov
> > > > > >> > > > > > > > >       Status: new
> > > > > >> > > > > > > > >  Ticket <URL: https://rt.rap.ucar.edu/rt/
> > > > > >> > > > > > Ticket/Display.html?id=84822
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Hi,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I'm running point-stat using ASCAT and
GFS data
> to
> > > > > verify
> > > > > >> > > surface
> > > > > >> > > > > > wind
> > > > > >> > > > > > > > > speeds.  I found an error in my ASCAT
input data
> > > that
> > > > > goes
> > > > > >> > back
> > > > > >> > > > to
> > > > > >> > > > > > Mar
> > > > > >> > > > > > > 7.
> > > > > >> > > > > > > > > I had switched the input source of the
data, and
> > > > within
> > > > > >> the
> > > > > >> > new
> > > > > >> > > > > data
> > > > > >> > > > > > > > files,
> > > > > >> > > > > > > > > it was allowing very small values (< 1
m/s) to
> be
> > > used
> > > > > as
> > > > > >> > data
> > > > > >> > > > > points
> > > > > >> > > > > > > in
> > > > > >> > > > > > > > > the verification.  I imagine that this is
an
> > issue,
> > > > > since
> > > > > >> > > > > point-stat
> > > > > >> > > > > > is
> > > > > >> > > > > > > > > using these very small values as matched
pairs
> > with
> > > > the
> > > > > >> GFS,
> > > > > >> > > > > correct?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Is there a way to regenerate the point-
stat
> > > statistics
> > > > > >> > without
> > > > > >> > > > > using
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > original GFS data?  I do have the *stat
and the
> > *mpr
> > > > > >> files,
> > > > > >> > and
> > > > > >> > > > it
> > > > > >> > > > > is
> > > > > >> > > > > > > > > pretty easy to identify where the bad
values are
> > > > > located.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > > > > > Roz
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > --
> > > > > >> > > > > > > > > Rosalyn MacCracken
> > > > > >> > > > > > > > > Support Scientist
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Ocean Applications Branch
> > > > > >> > > > > > > > > NOAA/NWS Ocean Prediction Center
> > > > > >> > > > > > > > > NCWCP
> > > > > >> > > > > > > > > 5830 University Research Ct
> > > > > >> > > > > > > > > College Park, MD  20740-3818
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > (p) 301-683-1551
> > > > > >> > > > > > > > > rosalyn.maccracken at noaa.gov
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > --
> > > > > >> > > > > > Rosalyn MacCracken
> > > > > >> > > > > > Support Scientist
> > > > > >> > > > > >
> > > > > >> > > > > > Ocean Applications Branch
> > > > > >> > > > > > NOAA/NWS Ocean Prediction Center
> > > > > >> > > > > > NCWCP
> > > > > >> > > > > > 5830 University Research Ct
> > > > > >> > > > > > College Park, MD  20740-3818
> > > > > >> > > > > >
> > > > > >> > > > > > (p) 301-683-1551
> > > > > >> > > > > > rosalyn.maccracken at noaa.gov
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > > Rosalyn MacCracken
> > > > > >> > > > Support Scientist
> > > > > >> > > >
> > > > > >> > > > Ocean Applications Branch
> > > > > >> > > > NOAA/NWS Ocean Prediction Center
> > > > > >> > > > NCWCP
> > > > > >> > > > 5830 University Research Ct
> > > > > >> > > > College Park, MD  20740-3818
> > > > > >> > > >
> > > > > >> > > > (p) 301-683-1551
> > > > > >> > > > rosalyn.maccracken at noaa.gov
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >> >
> > > > > >> > --
> > > > > >> > Rosalyn MacCracken
> > > > > >> > Support Scientist
> > > > > >> >
> > > > > >> > Ocean Applications Branch
> > > > > >> > NOAA/NWS Ocean Prediction Center
> > > > > >> > NCWCP
> > > > > >> > 5830 University Research Ct
> > > > > >> > College Park, MD  20740-3818
> > > > > >> >
> > > > > >> > (p) 301-683-1551
> > > > > >> > rosalyn.maccracken at noaa.gov
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Rosalyn MacCracken
> > > > > > Support Scientist
> > > > > >
> > > > > > Ocean Applications Branch
> > > > > > NOAA/NWS Ocean Prediction Center
> > > > > > NCWCP
> > > > > > 5830 University Research Ct
> > > > > > College Park, MD  20740-3818
> > > > > >
> > > > > > (p) 301-683-1551
> > > > > > rosalyn.maccracken at noaa.gov
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Rosalyn MacCracken
> > > > > Support Scientist
> > > > >
> > > > > Ocean Applications Branch
> > > > > NOAA/NWS Ocean Prediction Center
> > > > > NCWCP
> > > > > 5830 University Research Ct
> > > > > College Park, MD  20740-3818
> > > > >
> > > > > (p) 301-683-1551
> > > > > rosalyn.maccracken at noaa.gov
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Rosalyn MacCracken
> > > Support Scientist
> > >
> > > Ocean Applications Branch
> > > NOAA/NWS Ocean Prediction Center
> > > NCWCP
> > > 5830 University Research Ct
> > > College Park, MD  20740-3818
> > >
> > > (p) 301-683-1551
> > > rosalyn.maccracken at noaa.gov
> > >
> > >
> >
> >
>
>
> --
> Rosalyn MacCracken
> Support Scientist
>
> Ocean Applications Branch
> NOAA/NWS Ocean Prediction Center
> NCWCP
> 5830 University Research Ct
> College Park, MD  20740-3818
>
> (p) 301-683-1551
> rosalyn.maccracken at noaa.gov
>
>

------------------------------------------------


More information about the Met_help mailing list