[Met_help] [rt.rap.ucar.edu #90298] History for MET V8.1 stat_analysis with .stat from previous versions

Thu May 23 13:36:38 MDT 2019

----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Hi to whoever is working the help desk today :)

I was running stat_analysis from version 8.1 on .stat files I had created
previously with grid_stat version 8.0. I think the addition of the
FCST_UNITS and OBS_UNITS is causing some shift in the columns (see attached
image).

I don't think this is intended to be the desired behaviour, or maybe I am
missing some functionality I need here or it isn't meant to be run this way
(new stat_analysis version on .stat files created with an older version).
As EMC starts building archives of .stat files, in the future I could see
this being a problem because if more columns are added in the future we
don't want to have to remake archives in the latest MET version. Is it
possible that these unit columns can be NA? I don't know how easy or
difficult something like that may be.

Thanks!

Mallory

----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: MET V8.1 stat_analysis with .stat from previous versions
From: David Fillmore
Time: Tue May 21 14:02:12 2019

Hi Mallory -

In version 8.1 grid_stat and some other tools introduced the
FCST_UNITS and
OBS_UNITS columns to the .stat output files.
Currently stat_analysis 8.1 needs to run with grid_stat 8.1 output as
these
columns are always written.
Let me discuss with the MET group about adding a command line option
to
disable reading/writing units columns,
as it sounds like EMC regenerating archived .stat files will not be a
good
solution.

thanks,
David

On Tue, May 21, 2019 at 1:35 PM Mallory Row - NOAA Affiliate via RT <
met_help at ucar.edu> wrote:

>
> Tue May 21 13:35:53 2019: Request 90298 was acted upon.
> Transaction: Ticket created by mallory.row at noaa.gov
>        Queue: met_help
>      Subject: MET V8.1 stat_analysis with .stat from previous
versions
>        Owner: Nobody
>   Requestors: mallory.row at noaa.gov
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=90298 >
>
>
> Hi to whoever is working the help desk today :)
>
> I was running stat_analysis from version 8.1 on .stat files I had
created
> previously with grid_stat version 8.0. I think the addition of the
> FCST_UNITS and OBS_UNITS is causing some shift in the columns (see
attached
> image).
>
> I don't think this is intended to be the desired behaviour, or maybe
I am
> missing some functionality I need here or it isn't meant to be run
this way
> (new stat_analysis version on .stat files created with an older
version).
> As EMC starts building archives of .stat files, in the future I
could see
> this being a problem because if more columns are added in the future
we
> don't want to have to remake archives in the latest MET version. Is
it
> possible that these unit columns can be NA? I don't know how easy or
> difficult something like that may be.
>
> Thanks!
>
> Mallory
>
>

------------------------------------------------
Subject: MET V8.1 stat_analysis with .stat from previous versions
From: John Halley Gotway
Time: Tue May 21 14:29:06 2019

Mallory,

Yes, this is a good point!  Thanks for finding this.  It's been a long
time
since we've made any changes to the header columns so we haven't
thought
about this in a while.

Is this the output you get when using the "-dump_row" option?  I
actually
don't think this is a real big problem in STAT-Analysis... meaning,
when
it's doing real work, reading input data from columns, it is parsing
them
from the *correct* columns based on the version number listed in that
line.

To make sure, I went to a directory that contains output for both the
met-8.0 and met-8.1 builds and ran these 2 jobs:
*met-8.1/bin/stat_analysis -lookin met-8.1/out/grid_stat  -job
aggregate
-line_type CTC*
*met-8.1/bin/stat_analysis -lookin met-8.1/out/grid_stat
met-8.0/out/grid_stat -job aggregate -line_type CTC*

The first one yields:

*COL_NAME:  TOTAL FY_OY FY_ON FN_OY  FN_ON     CTC: 190782 40872 12504
12853 124553*

And the second combines met-8.0 and met-8.1 output and yields exactly
double the counts:

*COL_NAME:  TOTAL FY_OY FY_ON FN_OY  FN_ON     CTC: 381564 81744 25008
25706 249106*

So the FCST_UNITS and OBS_UNITS columns aren't messing up the parsing
logic.

The real problem is that when creating a "-dump_row" output file, it's
writing the headers of the current version to the first line of
output.
And the met-8.1 header doesn't match the met-8.0 (and earlier) data.

Here are a two possible changes to the logic to address this...

(1) When creating the "-dump_row" output file, check the version
number
listed in the first line of data.  If it matches the version number of
the
code, write the header to the first line of output.  If they don't
match,
skip writing the header row.  This is probably the easiest logic.

(2) When creating the "-dump_row" output file, if the first line read
from
the input is a header line, instead of skipping it, write it to the
output.
  This would enable met-8.1 stat_analysis to pass the met-8.0 header
columns through to the output file. But if the "-line_type" option is
set
to anything other than a single value, truncate the header line after
the
"LINE_TYPE" column.  That way, we avoid writing erroneous header
columns
for specific line types to the output files.

Mallory, what type of logic would be most useful to you?

Thanks,
John

On Tue, May 21, 2019 at 2:02 PM David Fillmore via RT
<met_help at ucar.edu>
wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=90298 >
>
> Hi Mallory -
>
> In version 8.1 grid_stat and some other tools introduced the
FCST_UNITS and
> OBS_UNITS columns to the .stat output files.
> Currently stat_analysis 8.1 needs to run with grid_stat 8.1 output
as these
> columns are always written.
> Let me discuss with the MET group about adding a command line option
to
> disable reading/writing units columns,
> as it sounds like EMC regenerating archived .stat files will not be
a good
> solution.
>
> thanks,
> David
>
> On Tue, May 21, 2019 at 1:35 PM Mallory Row - NOAA Affiliate via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > Tue May 21 13:35:53 2019: Request 90298 was acted upon.
> > Transaction: Ticket created by mallory.row at noaa.gov
> >        Queue: met_help
> >      Subject: MET V8.1 stat_analysis with .stat from previous
versions
> >        Owner: Nobody
> >   Requestors: mallory.row at noaa.gov
> >       Status: new
> >  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=90298 >
> >
> >
> > Hi to whoever is working the help desk today :)
> >
> > I was running stat_analysis from version 8.1 on .stat files I had
created
> > previously with grid_stat version 8.0. I think the addition of the
> > FCST_UNITS and OBS_UNITS is causing some shift in the columns (see
> attached
> > image).
> >
> > I don't think this is intended to be the desired behaviour, or
maybe I am
> > missing some functionality I need here or it isn't meant to be run
this
> way
> > (new stat_analysis version on .stat files created with an older
version).
> > As EMC starts building archives of .stat files, in the future I
could see
> > this being a problem because if more columns are added in the
future we
> > don't want to have to remake archives in the latest MET version.
Is it
> > possible that these unit columns can be NA? I don't know how easy
or
> > difficult something like that may be.
> >
> > Thanks!
> >
> > Mallory
> >
> >
>
>

------------------------------------------------
Subject: MET V8.1 stat_analysis with .stat from previous versions
From: Mallory Row - NOAA Affiliate
Time: Wed May 22 06:20:23 2019

Hi John,

Yup it was from using -dump_row. I think the second option is most
helpful.
I think seeing the header columns is most useful. I guess on that
note/thinking, what would happen if the input data was a mix of files
from
8.0 and 8.1? Would the header be from the first file read by
stat_analysis?
Perhaps, the first file read could be from 8.1 and then the 8.1
headers
would be printed out, but then there are lines from 8.0 that are
printed
out, so that would put us back to square one, haha. Maybe without the
header columns is best, but I feel that could get confusing if people
are
diving in and looking at the data.

Mallory

------------------------------------------------
Subject: MET V8.1 stat_analysis with .stat from previous versions
From: John Halley Gotway
Time: Wed May 22 16:41:14 2019

Mallory,

When used for the aggregate or aggregate_stat job types, the intended
purpose of the -dump_row option is for users to be able to see the
actual
input lines that were used when processing each job.  So it's meant as
a
sanity check to double-check the filtering logic the user defined.

But for the filter job (which writes it output to the -dump_row file),
the
intention is a little different.  For filter, stat_analysis is a fancy
form
of "grep", enabling the user to slice/dice their data however they'd
like.

Just earlier today, we talked about enhancing the filter job type to
support the "-set_hdr" option.  For example, we have some data with a
very
long FCST_UNITS string and want to reset that to a shorter string.  So
we'd
like to run a job like this:
*   stat_analysis -lookin stat_data -job filter -set_hdr FCST_UNITS
TEC
-dump_row short_units.txt*

But this is not currently supported.  Here's the GitHub issue for
this:
   https://github.com/NCAR/MET/issues/1129

If -set_hdr is used, this would require STAT-Analysis to actually
parse the
input lines, update strings, and write it back out.  As long as we're
parsing the data anyway, we could also consider updating the version
number
before writing it to the output.  And in that step, we would, for
example,
add FCST_UNITS and OBS_UNITS to the output of the filter job.

It seems to me like using "-dump_row" in both contexts is confusing.
Instead, perhaps we should require that the "filter" job use the
"-out_stat" job command option to specify its output file?

Would that be a useful solution?  Of course, that would only fix .stat
output files.  There is no "filter" job for MODE or MTD output data.

John

On Wed, May 22, 2019 at 6:20 AM Mallory Row - NOAA Affiliate via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=90298 >
>
> Hi John,
>
> Yup it was from using -dump_row. I think the second option is most
helpful.
> I think seeing the header columns is most useful. I guess on that
> note/thinking, what would happen if the input data was a mix of
files from
> 8.0 and 8.1? Would the header be from the first file read by
stat_analysis?
> Perhaps, the first file read could be from 8.1 and then the 8.1
headers
> would be printed out, but then there are lines from 8.0 that are
printed
> out, so that would put us back to square one, haha. Maybe without
the
> header columns is best, but I feel that could get confusing if
people are
> diving in and looking at the data.
>
> Mallory
>
>

------------------------------------------------
Subject: MET V8.1 stat_analysis with .stat from previous versions
From: Mallory Row - NOAA Affiliate
Time: Thu May 23 06:01:25 2019

I think that sounds like a useful solution! I'm glad that we could
piggy
back it off another issue. Thanks!

Mallory

On Wed, May 22, 2019 at 6:41 PM John Halley Gotway via RT
<met_help at ucar.edu>
wrote:

> Mallory,
>
> When used for the aggregate or aggregate_stat job types, the
intended
> purpose of the -dump_row option is for users to be able to see the
actual
> input lines that were used when processing each job.  So it's meant
as a
> sanity check to double-check the filtering logic the user defined.
>
> But for the filter job (which writes it output to the -dump_row
file), the
> intention is a little different.  For filter, stat_analysis is a
fancy form
> of "grep", enabling the user to slice/dice their data however they'd
like.
>
> Just earlier today, we talked about enhancing the filter job type to
> support the "-set_hdr" option.  For example, we have some data with
a very
> long FCST_UNITS string and want to reset that to a shorter string.
So we'd
> like to run a job like this:
> *   stat_analysis -lookin stat_data -job filter -set_hdr FCST_UNITS
TEC
> -dump_row short_units.txt*
>
> But this is not currently supported.  Here's the GitHub issue for
this:
>    https://github.com/NCAR/MET/issues/1129
>
> If -set_hdr is used, this would require STAT-Analysis to actually
parse the
> input lines, update strings, and write it back out.  As long as
we're
> parsing the data anyway, we could also consider updating the version
number
> before writing it to the output.  And in that step, we would, for
example,
> add FCST_UNITS and OBS_UNITS to the output of the filter job.
>
> It seems to me like using "-dump_row" in both contexts is confusing.
> Instead, perhaps we should require that the "filter" job use the
> "-out_stat" job command option to specify its output file?
>
> Would that be a useful solution?  Of course, that would only fix
.stat
> output files.  There is no "filter" job for MODE or MTD output data.
>
> John
>
> On Wed, May 22, 2019 at 6:20 AM Mallory Row - NOAA Affiliate via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=90298 >
> >
> > Hi John,
> >
> > Yup it was from using -dump_row. I think the second option is most
> helpful.
> > I think seeing the header columns is most useful. I guess on that
> > note/thinking, what would happen if the input data was a mix of
files
> from
> > 8.0 and 8.1? Would the header be from the first file read by
> stat_analysis?
> > Perhaps, the first file read could be from 8.1 and then the 8.1
headers
> > would be printed out, but then there are lines from 8.0 that are
printed
> > out, so that would put us back to square one, haha. Maybe without
the
> > header columns is best, but I feel that could get confusing if
people are
> > diving in and looking at the data.
> >
> > Mallory
> >
> >
>
>

------------------------------------------------
Subject: MET V8.1 stat_analysis with .stat from previous versions
From: John Halley Gotway
Time: Thu May 23 13:36:37 2019

Mallory,

OK great, thanks for confirming.  I added all of these details as
comments
to this GitHub issue:
   https://github.com/NCAR/MET/issues/1129

When we are able to work on this development, we might decide to split
it
out into separate issues, but for now it's in one.

I'll go ahead and resolve this ticket for now.

Thanks,
John

On Thu, May 23, 2019 at 6:01 AM Mallory Row - NOAA Affiliate via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=90298 >
>
> I think that sounds like a useful solution! I'm glad that we could
piggy
> back it off another issue. Thanks!
>
> Mallory
>
> On Wed, May 22, 2019 at 6:41 PM John Halley Gotway via RT <
> met_help at ucar.edu>
> wrote:
>
> > Mallory,
> >
> > When used for the aggregate or aggregate_stat job types, the
intended
> > purpose of the -dump_row option is for users to be able to see the
actual
> > input lines that were used when processing each job.  So it's
meant as a
> > sanity check to double-check the filtering logic the user defined.
> >
> > But for the filter job (which writes it output to the -dump_row
file),
> the
> > intention is a little different.  For filter, stat_analysis is a
fancy
> form
> > of "grep", enabling the user to slice/dice their data however
they'd
> like.
> >
> > Just earlier today, we talked about enhancing the filter job type
to
> > support the "-set_hdr" option.  For example, we have some data
with a
> very
> > long FCST_UNITS string and want to reset that to a shorter string.
So
> we'd
> > like to run a job like this:
> > *   stat_analysis -lookin stat_data -job filter -set_hdr
FCST_UNITS TEC
> > -dump_row short_units.txt*
> >
> > But this is not currently supported.  Here's the GitHub issue for
this:
> >    https://github.com/NCAR/MET/issues/1129
> >
> > If -set_hdr is used, this would require STAT-Analysis to actually
parse
> the
> > input lines, update strings, and write it back out.  As long as
we're
> > parsing the data anyway, we could also consider updating the
version
> number
> > before writing it to the output.  And in that step, we would, for
> example,
> > add FCST_UNITS and OBS_UNITS to the output of the filter job.
> >
> > It seems to me like using "-dump_row" in both contexts is
confusing.
> > Instead, perhaps we should require that the "filter" job use the
> > "-out_stat" job command option to specify its output file?
> >
> > Would that be a useful solution?  Of course, that would only fix
.stat
> > output files.  There is no "filter" job for MODE or MTD output
data.
> >
> > John
> >
> > On Wed, May 22, 2019 at 6:20 AM Mallory Row - NOAA Affiliate via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=90298 >
> > >
> > > Hi John,
> > >
> > > Yup it was from using -dump_row. I think the second option is
most
> > helpful.
> > > I think seeing the header columns is most useful. I guess on
that
> > > note/thinking, what would happen if the input data was a mix of
files
> > from
> > > 8.0 and 8.1? Would the header be from the first file read by
> > stat_analysis?
> > > Perhaps, the first file read could be from 8.1 and then the 8.1
headers
> > > would be printed out, but then there are lines from 8.0 that are
> printed
> > > out, so that would put us back to square one, haha. Maybe
without the
> > > header columns is best, but I feel that could get confusing if
people
> are
> > > diving in and looking at the data.
> > >
> > > Mallory
> > >
> > >
> >
> >
>
>

------------------------------------------------