[Met_help] [rt.rap.ucar.edu #67554] History for Filtering Stat_Analysis by time

John Halley Gotway via RT met_help at ucar.edu
Mon Jun 23 15:01:30 MDT 2014


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Dear MET team
 
Hope you are doing well at your end.

 
I would like to ask for your help/hints/advice with the following:
 
I want to obtain the performance of some variables in an interval of hours for several days. The "several days" can be a week and probably a month or even longer. That is, to aggregate statistics, say from  18 UTC to 23 UTC for all days in the given simulation period, but can be any reasonable hourly interval:
 
  day1_P_180000
  day1_P_190000
  .
  day1 P _230000
   .
   .
  day2_P_180000
  day2_P_190000
  .
  dayN_P_230000.. 
 
Here "P" denotes the period of interest, and "N" the corresponding days within that period
 
 I have included the "fcst_valid_hour H1 .... fcst_valid_hour H2 .... fcst_valid_hour Hn" flag (where "n" is the final hour in the interval of interest) into the configuration file of stat_analysis, and so far I have obtained some results. This allowed to extract the required hours for all days in that period. Part of the objective is to also obtain the performance by station, so that these flags are in combination with the "-by" flag.
 
One thing to note is that this analysis is conjunction with the one for the entire period. That is, I will be doing it as a separate one.
 
 
My questions are:
 
1) Nevertheless I have some results, I want to know if this procedure is correct so that am not overlooking something that could be important regarding the use of the configuration flags. I read some posts but do not have it so clear at all.
 
 
2) I want to know if the following approach is valid: Instead of using the "fcst_valid_hour" flag, I am thinking in using a combination of wild cards/globbing to work directly in the required files for processing. That is: after the required combination of commands, to end up with something like (the notation is similar as above (P, period; D, day; N, # days; n, # hrs)):
 
    point_stat..L_YYYY_P_D1_H1....stat
    point_stat..L_YYYY_P_D1_H2....stat
    point_stat..L_YYYY_P_D1_Hn....stat
     .
    point_stat..L_YYYY_P_DN_H1....stat
    point_stat..L_YYYY_P_DN_Hn....stat
 
and then make some links to tell stat_analysis where to find these files and further  aggregate the stats just for these files
 
Probably it may help in CPU time since it does not have to read all the files for the original simulation period.
 
Is it equivalent/valid ?
 
 
 
THANK you very much for your time and help
 
Best
 
Victor

----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: Filtering Stat_Analysis by time
From: John Halley Gotway
Time: Mon Jun 09 14:23:01 2014

Victor,

Yes, that all sounds reasonable to me.

Using "fcst_valid_hour" to select hours of the day is the way I would
do it
as well.  Then I'd use "fcst_valid_beg" and "fcst_valid_end" (or
fcst_init_beg and fcst_init_end) to define the time window you'd like
to
use.  Also, using "-line_type MPR -by OBS_SID" will run the same job
for
each unique station ID value in the column labeled OBS_SID in the
matched
pair (MPR) output line type.

I would try running a couple of jobs to see how long it takes to get
the
output.  If it takes "too long" (up to you decide what too long is),
then
yes, your strategy of stratifying the data yourself would speed it up.
The
more data you hand stat_analysis, the longer it takes to parse, and
simple
unix commands like "grep" will filter the data much faster than
stat_analysis can.  On the other hand, writing scripts to subset the
data
takes more time and is error-prone.  So I'd only suggest doing that if
stat_analysis is taking too long.

Lastly, it sounds like you're running stat_analysis with a config
file.
With a config file, stat_analysis is actually performing 2 steps.
First,
it reads all the input data and applies all of the filtering criteria
defined above the "jobs" command.  It writes the filtered data to a
temp
file.  Then for each job you've defined, it reads the data from the
temp
file, applies any more filtering criteria you've defined for that job,
and
then performs the analysis.

Suppose you have 10 different jobs you want to run over 100,000 lines
of
data.  If some of the filtering criteria is common to all of them,
define
them above the "jobs" setting.  For example, valid hours of 18, 19,
20, 21,
22, 23.  That may reduce those 100,000 lines down to 25,000 lines, and
each
job would only read 25,000 lines rather than the full 100,000.

If you instead defined the valid hour setting for each individual job,
each
job would read all 100,000 lines and run much slower.

Hope that helps.

Thanks,
John


On Mon, Jun 9, 2014 at 1:50 PM, Victor Almanza via RT
<met_help at ucar.edu>
wrote:

>
> Mon Jun 09 13:50:03 2014: Request 67554 was acted upon.
> Transaction: Ticket created by halmanset at yahoo.com
>        Queue: met_help
>      Subject: Filtering Stat_Analysis by time
>        Owner: Nobody
>   Requestors: halmanset at yahoo.com
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=67554 >
>
>
> Dear MET team
>
> Hope you are doing well at your end.
>
>
> I would like to ask for your help/hints/advice with the following:
>
> I want to obtain the performance of some variables in an interval of
hours
> for several days. The "several days" can be a week and probably a
month or
> even longer. That is, to aggregate statistics, say from  18 UTC to
23 UTC
> for all days in the given simulation period, but can be any
reasonable
> hourly interval:
>
>   day1_P_180000
>   day1_P_190000
>   .
>   day1 P _230000
>    .
>    .
>   day2_P_180000
>   day2_P_190000
>   .
>   dayN_P_230000..
>
> Here "P" denotes the period of interest, and "N" the corresponding
days
> within that period
>
>  I have included the "fcst_valid_hour H1 .... fcst_valid_hour H2
....
> fcst_valid_hour Hn" flag (where "n" is the final hour in the
interval of
> interest) into the configuration file of stat_analysis, and so far I
have
> obtained some results. This allowed to extract the required hours
for all
> days in that period. Part of the objective is to also obtain the
> performance by station, so that these flags are in combination with
the
> "-by" flag.
>
> One thing to note is that this analysis is conjunction with the one
for
> the entire period. That is, I will be doing it as a separate one.
>
>
> My questions are:
>
> 1) Nevertheless I have some results, I want to know if this
procedure is
> correct so that am not overlooking something that could be important
> regarding the use of the configuration flags. I read some posts but
do not
> have it so clear at all.
>
>
> 2) I want to know if the following approach is valid: Instead of
using the
> "fcst_valid_hour" flag, I am thinking in using a combination of wild
> cards/globbing to work directly in the required files for
processing. That
> is: after the required combination of commands, to end up with
something
> like (the notation is similar as above (P, period; D, day; N, #
days; n, #
> hrs)):
>
>     point_stat..L_YYYY_P_D1_H1....stat
>     point_stat..L_YYYY_P_D1_H2....stat
>     point_stat..L_YYYY_P_D1_Hn....stat
>      .
>     point_stat..L_YYYY_P_DN_H1....stat
>     point_stat..L_YYYY_P_DN_Hn....stat
>
> and then make some links to tell stat_analysis where to find these
files
> and further  aggregate the stats just for these files
>
> Probably it may help in CPU time since it does not have to read all
the
> files for the original simulation period.
>
> Is it equivalent/valid ?
>
>
>
> THANK you very much for your time and help
>
> Best
>
> Victor
>

------------------------------------------------
Subject: Filtering Stat_Analysis by time
From: Victor Almanza
Time: Mon Jun 09 16:41:44 2014

Dear John

I'll start including your hints in this respect. As a note, in the
preliminary tests I mentioned, the job did not take too long. However,
I have to try a longer period to have more information about the
computing time. 


You are right, I am using config files. The explanation/advice to
define the filtering criteria is of great help in writing the scripts,
since I will be dealing with a lot of files.

Once again, Thank you for your prompt guidance and time 

Best

Victor



________________________________
 De: John Halley Gotway via RT <met_help at ucar.edu>
Para: halmanset at yahoo.com
Enviado: Lunes, 9 de junio, 2014 1:23 P.M.
Asunto: Re: [rt.rap.ucar.edu #67554] Filtering Stat_Analysis by time


Victor,

Yes, that all sounds reasonable to me.

Using "fcst_valid_hour" to select hours of the day is the way I would
do it
as well.  Then I'd use "fcst_valid_beg" and "fcst_valid_end" (or
fcst_init_beg and fcst_init_end) to define the time window you'd like
to
use.  Also, using "-line_type MPR -by OBS_SID" will run the same job
for
each unique station ID value in the column labeled OBS_SID in the
matched
pair (MPR) output line type.

I would try running a couple of jobs to see how long it takes to get
the
output.  If it takes "too long" (up to you decide what too long is),
then
yes, your strategy of stratifying the data yourself would speed it
up.  The
more data you hand stat_analysis, the longer it takes to parse, and
simple
unix commands like "grep" will filter the data much faster than
stat_analysis can.  On the other hand, writing scripts to subset the
data
takes more time and is error-prone.  So I'd only suggest doing that if
stat_analysis is taking too long.

Lastly, it sounds like you're running stat_analysis with a config
file.
With a config file, stat_analysis is actually performing 2 steps. 
First,
it reads all the input data and applies all of the filtering criteria
defined above the "jobs" command.  It writes the filtered data to a
temp
file.  Then for each job you've defined, it reads the data from the
temp
file, applies any more filtering criteria you've defined for that job,
and
then performs the analysis.

Suppose you have 10 different jobs you want to run over 100,000 lines
of
data.  If some of the filtering criteria is common to all of them,
define
them above the "jobs" setting.  For example, valid hours of 18, 19,
20, 21,
22, 23.  That may reduce those 100,000 lines down to 25,000 lines, and
each
job would only read 25,000 lines rather than the full 100,000.

If you instead defined the valid hour setting for each individual job,
each
job would read all 100,000 lines and run much slower.

Hope that helps.

Thanks,
John


On Mon, Jun 9, 2014 at 1:50 PM, Victor Almanza via RT
<met_help at ucar.edu>
wrote:

>
> Mon Jun 09 13:50:03 2014: Request 67554 was acted upon.
> Transaction: Ticket created by halmanset at yahoo.com
>        Queue: met_help
>      Subject: Filtering Stat_Analysis by time
>        Owner: Nobody
>   Requestors: halmanset at yahoo.com
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=67554 >
>
>
> Dear MET team
>
> Hope you are doing well at your end.
>
>
> I would like to ask for your help/hints/advice with the following:
>
> I want to obtain the performance of some variables in an interval of
hours
> for several days. The "several days" can be a week and probably a
month or
> even longer. That is, to aggregate statistics, say from  18 UTC to
23 UTC
> for all days in the given simulation period, but can be any
reasonable
> hourly interval:
>
>   day1_P_180000
>   day1_P_190000
>   .
>   day1 P _230000
>    .
>    .
>   day2_P_180000
>   day2_P_190000
>   .
>   dayN_P_230000..
>
> Here "P" denotes the period of interest, and "N" the corresponding
days
> within that period
>
>  I have included the "fcst_valid_hour H1 .... fcst_valid_hour H2
....
> fcst_valid_hour Hn" flag (where "n" is the final hour in the
interval of
> interest) into the configuration file of stat_analysis, and so far I
have
> obtained some results. This allowed to extract the required hours
for all
> days in that period. Part of the objective is to also obtain the
> performance by station, so that these flags are in combination with
the
> "-by" flag.
>
> One thing to note is that this analysis is conjunction with the one
for
> the entire period. That is, I will be doing it as a separate one.
>
>
> My questions are:
>
> 1) Nevertheless I have some results, I want to know if this
procedure is
> correct so that am not overlooking something that could be important
> regarding the use of the configuration flags. I read some posts but
do not
> have it so clear at all.
>
>
> 2) I want to know if the following approach is valid: Instead of
using the
> "fcst_valid_hour" flag, I am thinking in using a combination of wild
> cards/globbing to work directly in the required files for
processing. That
> is: after the required combination of commands, to end up with
something
> like (the notation is similar as above (P, period; D, day; N, #
days; n, #
> hrs)):
>
>     point_stat..L_YYYY_P_D1_H1....stat
>     point_stat..L_YYYY_P_D1_H2....stat
>     point_stat..L_YYYY_P_D1_Hn....stat
>      .
>     point_stat..L_YYYY_P_DN_H1....stat
>     point_stat..L_YYYY_P_DN_Hn....stat
>
> and then make some links to tell stat_analysis where to find these
files
> and further  aggregate the stats just for these files
>
> Probably it may help in CPU time since it does not have to read all
the
> files for the original simulation period.
>
> Is it equivalent/valid ?
>
>
>
> THANK you very much for your time and help
>
> Best
>
> Victor
>

------------------------------------------------
Subject: Filtering Stat_Analysis by time
From: Victor Almanza
Time: Wed Jun 18 12:24:10 2014

Dear John

Sorry for getting back to you that late

I followed your suggestions and they did help. I found better to
define the criteria  (hours and time window) above the jobs setting.
All the tests I made ran faster in this way. Thus, I included them in
the scripts and am doing additional tests to double check the scripts
are working as expected.
I noticed that splitting the observation files in hourly files also
speed up the computing time in pointStat.

Finally, I compared the values of pointStat with the nearest neighbor
method and are slightly different than those obtained with the WRF's
read_wrf_nc utility. This WRF utility also extracts values at a point
closest to the lat/lon point of interest without any interpolation.

I would like to know if this difference is related to the GRIB
conversion (UPP). Here is a sample of Temperature at 2m for the first
7 hours (UTC) of a wrfout file.

OBS              PointStat       read_wrf
296.75000     297.07944     293.9324341
294.64999     292.07125     292.1432190
292.95001     289.91562     289.9297485
292.04999     288.37850     288.3859558
291.35001     287.63544     287.6493225
290.75000     286.83188     286.8536682
289.95001     286.23125     286.2442322
289.14999     285.60194     285.6152039

Thank you in advance for your time

Best

Victor


________________________________
 De: John Halley Gotway via RT <met_help at ucar.edu>
Para: halmanset at yahoo.com
Enviado: Lunes, 9 de junio, 2014 1:23 P.M.
Asunto: Re: [rt.rap.ucar.edu #67554] Filtering Stat_Analysis by time


Victor,

Yes, that all sounds reasonable to me.

Using "fcst_valid_hour" to select hours of the day is the way I would
do it
as well.  Then I'd use "fcst_valid_beg" and "fcst_valid_end" (or
fcst_init_beg and fcst_init_end) to define the time window you'd like
to
use.  Also, using "-line_type MPR -by OBS_SID" will run the same job
for
each unique station ID value in the column labeled OBS_SID in the
matched
pair (MPR) output line type.

I would try running a couple of jobs to see how long it takes to get
the
output.  If it takes "too long" (up to you decide what too long is),
then
yes, your strategy of stratifying the data yourself would speed it
up.  The
more data you hand stat_analysis, the longer it takes to parse, and
simple
unix commands like "grep" will filter the data much faster than
stat_analysis can.  On the other hand, writing scripts to subset the
data
takes more time and is error-prone.  So I'd only suggest doing that if
stat_analysis is taking too long.

Lastly, it sounds like you're running stat_analysis with a config
file.
With a config file, stat_analysis is actually performing 2 steps. 
First,
it reads all the input data and applies all of the filtering criteria
defined above the "jobs" command.  It writes the filtered data to a
temp
file.  Then for each job you've defined, it reads the data from the
temp
file, applies any more filtering criteria you've defined for that job,
and
then performs the analysis.

Suppose you have 10 different jobs you want to run over 100,000 lines
of
data.  If some of the filtering criteria is common to all of them,
define
them above the "jobs" setting.  For example, valid hours of 18, 19,
20, 21,
22, 23.  That may reduce those 100,000 lines down to 25,000 lines, and
each
job would only read 25,000 lines rather than the full 100,000.

If you instead defined the valid hour setting for each individual job,
each
job would read all 100,000 lines and run much slower.

Hope that helps.

Thanks,
John


On Mon, Jun 9, 2014 at 1:50 PM, Victor Almanza via RT
<met_help at ucar.edu>
wrote:

>
> Mon Jun 09 13:50:03 2014: Request 67554 was acted upon.
> Transaction: Ticket created by halmanset at yahoo.com
>        Queue: met_help
>      Subject: Filtering Stat_Analysis by time
>        Owner: Nobody
>   Requestors: halmanset at yahoo.com
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=67554 >
>
>
> Dear MET team
>
> Hope you are doing well at your end.
>
>
> I would like to ask for your help/hints/advice with the following:
>
> I want to obtain the performance of some variables in an interval of
hours
> for several days. The "several days" can be a week and probably a
month or
> even longer. That is, to aggregate statistics, say from  18 UTC to
23 UTC
> for all days in the given simulation period, but can be any
reasonable
> hourly interval:
>
>   day1_P_180000
>   day1_P_190000
>   .
>   day1 P _230000
>    .
>    .
>   day2_P_180000
>   day2_P_190000
>   .
>   dayN_P_230000..
>
> Here "P" denotes the period of interest, and "N" the corresponding
days
> within that period
>
>  I have included the "fcst_valid_hour H1 .... fcst_valid_hour H2
....
> fcst_valid_hour Hn" flag (where "n" is the final hour in the
interval of
> interest) into the configuration file of stat_analysis, and so far I
have
> obtained some results. This allowed to extract the required hours
for all
> days in that period. Part of the objective is to also obtain the
> performance by station, so that these flags are in combination with
the
> "-by" flag.
>
> One thing to note is that this analysis is conjunction with the one
for
> the entire period. That is, I will be doing it as a separate one.
>
>
> My questions are:
>
> 1) Nevertheless I have some results, I want to know if this
procedure is
> correct so that am not overlooking something that could be important
> regarding the use of the configuration flags. I read some posts but
do not
> have it so clear at all.
>
>
> 2) I want to know if the following approach is valid: Instead of
using the
> "fcst_valid_hour" flag, I am thinking in using a combination of wild
> cards/globbing to work directly in the required files for
processing. That
> is: after the required combination of commands, to end up with
something
> like (the notation is similar as above (P, period; D, day; N, #
days; n, #
> hrs)):
>
>     point_stat..L_YYYY_P_D1_H1....stat
>     point_stat..L_YYYY_P_D1_H2....stat
>     point_stat..L_YYYY_P_D1_Hn....stat
>      .
>     point_stat..L_YYYY_P_DN_H1....stat
>     point_stat..L_YYYY_P_DN_Hn....stat
>
> and then make some links to tell stat_analysis where to find these
files
> and further  aggregate the stats just for these files
>
> Probably it may help in CPU time since it does not have to read all
the
> files for the original simulation period.
>
> Is it equivalent/valid ?
>
>
>
> THANK you very much for your time and help
>
> Best
>
> Victor
>

------------------------------------------------
Subject: Filtering Stat_Analysis by time
From: Randy Bullock
Time: Thu Jun 19 11:33:15 2014

Hi Victor -

Thanks for your email and your questions.  Here are my thoughts &
questions
for you:

Regarding your question #1, I talked this over with one of the
statisticians in our group, and she thinks your approach is valid.

Regarding your question #2, I'm not sure I understand.  If you're
talking
about introducing wildcards into the point_stat config file, then the
answer is no ... that won't work.  If you're talking about making some
symbolic links in your filesystem to redirect the input of point stat
to
other files, that should be OK.  If that doesn't answer your question,
could you perhaps explain more fully to me what you're doing?

Take care.

Randy


On Mon, Jun 9, 2014 at 1:50 PM, Victor Almanza via RT
<met_help at ucar.edu>
wrote:

>
> Mon Jun 09 13:50:03 2014: Request 67554 was acted upon.
> Transaction: Ticket created by halmanset at yahoo.com
>        Queue: met_help
>      Subject: Filtering Stat_Analysis by time
>        Owner: Nobody
>   Requestors: halmanset at yahoo.com
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=67554 >
>
>
> Dear MET team
>
> Hope you are doing well at your end.
>
>
> I would like to ask for your help/hints/advice with the following:
>
> I want to obtain the performance of some variables in an interval of
hours
> for several days. The "several days" can be a week and probably a
month or
> even longer. That is, to aggregate statistics, say from  18 UTC to
23 UTC
> for all days in the given simulation period, but can be any
reasonable
> hourly interval:
>
>   day1_P_180000
>   day1_P_190000
>   .
>   day1 P _230000
>    .
>    .
>   day2_P_180000
>   day2_P_190000
>   .
>   dayN_P_230000..
>
> Here "P" denotes the period of interest, and "N" the corresponding
days
> within that period
>
>  I have included the "fcst_valid_hour H1 .... fcst_valid_hour H2
....
> fcst_valid_hour Hn" flag (where "n" is the final hour in the
interval of
> interest) into the configuration file of stat_analysis, and so far I
have
> obtained some results. This allowed to extract the required hours
for all
> days in that period. Part of the objective is to also obtain the
> performance by station, so that these flags are in combination with
the
> "-by" flag.
>
> One thing to note is that this analysis is conjunction with the one
for
> the entire period. That is, I will be doing it as a separate one.
>
>
> My questions are:
>
> 1) Nevertheless I have some results, I want to know if this
procedure is
> correct so that am not overlooking something that could be important
> regarding the use of the configuration flags. I read some posts but
do not
> have it so clear at all.
>
>
> 2) I want to know if the following approach is valid: Instead of
using the
> "fcst_valid_hour" flag, I am thinking in using a combination of wild
> cards/globbing to work directly in the required files for
processing. That
> is: after the required combination of commands, to end up with
something
> like (the notation is similar as above (P, period; D, day; N, #
days; n, #
> hrs)):
>
>     point_stat..L_YYYY_P_D1_H1....stat
>     point_stat..L_YYYY_P_D1_H2....stat
>     point_stat..L_YYYY_P_D1_Hn....stat
>      .
>     point_stat..L_YYYY_P_DN_H1....stat
>     point_stat..L_YYYY_P_DN_Hn....stat
>
> and then make some links to tell stat_analysis where to find these
files
> and further  aggregate the stats just for these files
>
> Probably it may help in CPU time since it does not have to read all
the
> files for the original simulation period.
>
> Is it equivalent/valid ?
>
>
>
> THANK you very much for your time and help
>
> Best
>
> Victor
>

------------------------------------------------
Subject: Filtering Stat_Analysis by time
From: Victor Almanza
Time: Thu Jun 19 12:06:57 2014

Dear Randy

Hope you are doing well

Yesterday I got back to a former reply by John Halley regarding these
questions. He made some suggestions on June 9th. I followed his
hints/advice and made some tests with my files. I rather preferred to
get back to him once I had some results and inform that the hints did
help. So,  I apologize in case my delay in replying make these
questions to be double-posted.

Nevertheless, let me tell you that regarding your question #2, I was
attempting to use the wildcards in order to make some symbolic links
just as you said. However, John also mentioned that it could be better
to work with the flags of the configuration file instead and that's
how I wrote the scripts.

Finally, I would like to know if the reply I sent yesterday to MET
help was received. If not, I can send it again.

THANKS so much for your time

Best !

Victor


________________________________
 De: Randy Bullock via RT <met_help at ucar.edu>
Para: halmanset at yahoo.com
Enviado: Jueves, 19 de junio, 2014 10:33 A.M.
Asunto: Re: [rt.rap.ucar.edu #67554] Filtering Stat_Analysis by time


Hi Victor -

Thanks for your email and your questions.  Here are my thoughts &
questions
for you:

Regarding your question #1, I talked this over with one of the
statisticians in our group, and she thinks your approach is valid.

Regarding your question #2, I'm not sure I understand.  If you're
talking
about introducing wildcards into the point_stat config file, then the
answer is no ... that won't work.  If you're talking about making some
symbolic links in your filesystem to redirect the input of point stat
to
other files, that should be OK.  If that doesn't answer your question,
could you perhaps explain more fully to me what you're doing?

Take care.

Randy


On Mon, Jun 9, 2014 at 1:50 PM, Victor Almanza via RT
<met_help at ucar.edu>
wrote:

>
> Mon Jun 09 13:50:03 2014: Request 67554 was acted upon.
> Transaction: Ticket created by halmanset at yahoo.com
>        Queue: met_help
>      Subject: Filtering Stat_Analysis by time
>        Owner: Nobody
>   Requestors: halmanset at yahoo.com
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=67554 >
>
>
> Dear MET team
>
> Hope you are doing well at your end.
>
>
> I would like to ask for your help/hints/advice with the following:
>
> I want to obtain the performance of some variables in an interval of
hours
> for several days. The "several days" can be a week and probably a
month or
> even longer. That is, to aggregate statistics, say from  18 UTC to
23 UTC
> for all days in the given simulation period, but can be any
reasonable
> hourly interval:
>
>   day1_P_180000
>   day1_P_190000
>   .
>   day1 P _230000
>    .
>    .
>   day2_P_180000
>   day2_P_190000
>   .
>   dayN_P_230000..
>
> Here "P" denotes the period of interest, and "N" the corresponding
days
> within that period
>
>  I have included the "fcst_valid_hour H1 .... fcst_valid_hour H2
....
> fcst_valid_hour Hn" flag (where "n" is the final hour in the
interval of
> interest) into the configuration file of stat_analysis, and so far I
have
> obtained some results. This allowed to extract the required hours
for all
> days in that period. Part of the objective is to also obtain the
> performance by station, so that these flags are in combination with
the
> "-by" flag.
>
> One thing to note is that this analysis is conjunction with the one
for
> the entire period. That is, I will be doing it as a separate one.
>
>
> My questions are:
>
> 1) Nevertheless I have some results, I want to know if this
procedure is
> correct so that am not overlooking something that could be important
> regarding the use of the configuration flags. I read some posts but
do not
> have it so clear at all.
>
>
> 2) I want to know if the following approach is valid: Instead of
using the
> "fcst_valid_hour" flag, I am thinking in using a combination of wild
> cards/globbing to work directly in the required files for
processing. That
> is: after the required combination of commands, to end up with
something
> like (the notation is similar as above (P, period; D, day; N, #
days; n, #
> hrs)):
>
>     point_stat..L_YYYY_P_D1_H1....stat
>     point_stat..L_YYYY_P_D1_H2....stat
>     point_stat..L_YYYY_P_D1_Hn....stat
>      .
>     point_stat..L_YYYY_P_DN_H1....stat
>     point_stat..L_YYYY_P_DN_Hn....stat
>
> and then make some links to tell stat_analysis where to find these
files
> and further  aggregate the stats just for these files
>
> Probably it may help in CPU time since it does not have to read all
the
> files for the original simulation period.
>
> Is it equivalent/valid ?
>
>
>
> THANK you very much for your time and help
>
> Best
>
> Victor
>

------------------------------------------------
Subject: Filtering Stat_Analysis by time
From: Randy Bullock
Time: Thu Jun 19 12:43:16 2014

Victor -

Yes, we got your reply yesterday.

So are your questions all answered, then?

Randy


On Thu, Jun 19, 2014 at 12:06 PM, Victor Almanza via RT
<met_help at ucar.edu>
wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=67554 >
>
> Dear Randy
>
> Hope you are doing well
>
> Yesterday I got back to a former reply by John Halley regarding
these
> questions. He made some suggestions on June 9th. I followed his
> hints/advice and made some tests with my files. I rather preferred
to get
> back to him once I had some results and inform that the hints did
help. So,
>  I apologize in case my delay in replying make these questions to be
> double-posted.
>
> Nevertheless, let me tell you that regarding your question #2, I was
> attempting to use the wildcards in order to make some symbolic links
just
> as you said. However, John also mentioned that it could be better to
work
> with the flags of the configuration file instead and that's how I
wrote the
> scripts.
>
> Finally, I would like to know if the reply I sent yesterday to MET
help
> was received. If not, I can send it again.
>
> THANKS so much for your time
>
> Best !
>
> Victor
>
>
> ________________________________
>  De: Randy Bullock via RT <met_help at ucar.edu>
> Para: halmanset at yahoo.com
> Enviado: Jueves, 19 de junio, 2014 10:33 A.M.
> Asunto: Re: [rt.rap.ucar.edu #67554] Filtering Stat_Analysis by time
>
>
> Hi Victor -
>
> Thanks for your email and your questions.  Here are my thoughts &
questions
> for you:
>
> Regarding your question #1, I talked this over with one of the
> statisticians in our group, and she thinks your approach is valid.
>
> Regarding your question #2, I'm not sure I understand.  If you're
talking
> about introducing wildcards into the point_stat config file, then
the
> answer is no ... that won't work.  If you're talking about making
some
> symbolic links in your filesystem to redirect the input of point
stat to
> other files, that should be OK.  If that doesn't answer your
question,
> could you perhaps explain more fully to me what you're doing?
>
> Take care.
>
> Randy
>
>
> On Mon, Jun 9, 2014 at 1:50 PM, Victor Almanza via RT
<met_help at ucar.edu>
> wrote:
>
> >
> > Mon Jun 09 13:50:03 2014: Request 67554 was acted upon.
> > Transaction: Ticket created by halmanset at yahoo.com
> >        Queue: met_help
> >      Subject: Filtering Stat_Analysis by time
> >        Owner: Nobody
> >   Requestors: halmanset at yahoo.com
> >       Status: new
> >  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=67554 >
> >
> >
> > Dear MET team
> >
> > Hope you are doing well at your end.
> >
> >
> > I would like to ask for your help/hints/advice with the following:
> >
> > I want to obtain the performance of some variables in an interval
of
> hours
> > for several days. The "several days" can be a week and probably a
month
> or
> > even longer. That is, to aggregate statistics, say from  18 UTC to
23 UTC
> > for all days in the given simulation period, but can be any
reasonable
> > hourly interval:
> >
> >   day1_P_180000
> >   day1_P_190000
> >   .
> >   day1 P _230000
> >    .
> >    .
> >   day2_P_180000
> >   day2_P_190000
> >   .
> >   dayN_P_230000..
> >
> > Here "P" denotes the period of interest, and "N" the corresponding
days
> > within that period
> >
> >  I have included the "fcst_valid_hour H1 .... fcst_valid_hour H2
....
> > fcst_valid_hour Hn" flag (where "n" is the final hour in the
interval of
> > interest) into the configuration file of stat_analysis, and so far
I have
> > obtained some results. This allowed to extract the required hours
for all
> > days in that period. Part of the objective is to also obtain the
> > performance by station, so that these flags are in combination
with the
> > "-by" flag.
> >
> > One thing to note is that this analysis is conjunction with the
one for
> > the entire period. That is, I will be doing it as a separate one.
> >
> >
> > My questions are:
> >
> > 1) Nevertheless I have some results, I want to know if this
procedure is
> > correct so that am not overlooking something that could be
important
> > regarding the use of the configuration flags. I read some posts
but do
> not
> > have it so clear at all.
> >
> >
> > 2) I want to know if the following approach is valid: Instead of
using
> the
> > "fcst_valid_hour" flag, I am thinking in using a combination of
wild
> > cards/globbing to work directly in the required files for
processing.
> That
> > is: after the required combination of commands, to end up with
something
> > like (the notation is similar as above (P, period; D, day; N, #
days; n,
> #
> > hrs)):
> >
> >     point_stat..L_YYYY_P_D1_H1....stat
> >     point_stat..L_YYYY_P_D1_H2....stat
> >     point_stat..L_YYYY_P_D1_Hn....stat
> >      .
> >     point_stat..L_YYYY_P_DN_H1....stat
> >     point_stat..L_YYYY_P_DN_Hn....stat
> >
> > and then make some links to tell stat_analysis where to find these
files
> > and further  aggregate the stats just for these files
> >
> > Probably it may help in CPU time since it does not have to read
all the
> > files for the original simulation period.
> >
> > Is it equivalent/valid ?
> >
> >
> >
> > THANK you very much for your time and help
> >
> > Best
> >
> > Victor
> >
>

------------------------------------------------
Subject: Filtering Stat_Analysis by time
From: Victor Almanza
Time: Thu Jun 19 13:19:54 2014

Dear Randy

Yes, the questions regarding the filtering by time are all answered. 
Regarding the difference in values of nearest neighbor method I
mentioned in yesterday's reply, I think it would be better to ask
about it separately in a future email.

Thank you very much

Best !
Victor


________________________________
 De: Randy Bullock via RT <met_help at ucar.edu>
Para: halmanset at yahoo.com
Enviado: Jueves, 19 de junio, 2014 11:43 A.M.
Asunto: Re: [rt.rap.ucar.edu #67554] Filtering Stat_Analysis by time


Victor -

Yes, we got your reply yesterday.

So are your questions all answered, then?

Randy


On Thu, Jun 19, 2014 at 12:06 PM, Victor Almanza via RT
<met_help at ucar.edu>
wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=67554 >
>
> Dear Randy
>
> Hope you are doing well
>
> Yesterday I got back to a former reply by John Halley regarding
these
> questions. He made some suggestions on June 9th. I followed his
> hints/advice and made some tests with my files. I rather preferred
to get
> back to him once I had some results and inform that the hints did
help. So,
>  I apologize in case my delay in replying make these questions to be
> double-posted.
>
> Nevertheless, let me tell you that regarding your question #2, I was
> attempting to use the wildcards in order to make some symbolic links
just
> as you said. However, John also mentioned that it could be better to
work
> with the flags of the configuration file instead and that's how I
wrote the
> scripts.
>
> Finally, I would like to know if the reply I sent yesterday to MET
help
> was received. If not, I can send it again.
>
> THANKS so much for your time
>
> Best !
>
> Victor
>
>
> ________________________________
>  De: Randy Bullock via RT <met_help at ucar.edu>
> Para: halmanset at yahoo.com
> Enviado: Jueves, 19 de junio, 2014 10:33 A.M.
> Asunto: Re: [rt.rap.ucar.edu #67554] Filtering Stat_Analysis by time
>
>
> Hi Victor -
>
> Thanks for your email and your questions.  Here are my thoughts &
questions
> for you:
>
> Regarding your question #1, I talked this over with one of the
> statisticians in our group, and she thinks your approach is valid.
>
> Regarding your question #2, I'm not sure I understand.  If you're
talking
> about introducing wildcards into the point_stat config file, then
the
> answer is no ... that won't work.  If you're talking about making
some
> symbolic links in your filesystem to redirect the input of point
stat to
> other files, that should be OK.  If that doesn't answer your
question,
> could you perhaps explain more fully to me what you're doing?
>
> Take care.
>
> Randy
>
>
> On Mon, Jun 9, 2014 at 1:50 PM, Victor Almanza via RT
<met_help at ucar.edu>
> wrote:
>
> >
> > Mon Jun 09 13:50:03 2014: Request 67554 was acted upon.
> > Transaction: Ticket created by halmanset at yahoo.com
> >        Queue: met_help
> >      Subject: Filtering Stat_Analysis by time
> >        Owner: Nobody
> >   Requestors: halmanset at yahoo.com
> >       Status: new
> >  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=67554 >
> >
> >
> > Dear MET team
> >
> > Hope you are doing well at your end.
> >
> >
> > I would like to ask for your help/hints/advice with the following:
> >
> > I want to obtain the performance of some variables in an interval
of
> hours
> > for several days. The "several days" can be a week and probably a
month
> or
> > even longer. That is, to aggregate statistics, say from  18 UTC to
23 UTC
> > for all days in the given simulation period, but can be any
reasonable
> > hourly interval:
> >
> >   day1_P_180000
> >   day1_P_190000
> >   .
> >   day1 P _230000
> >    .
> >    .
> >   day2_P_180000
> >   day2_P_190000
> >   .
> >   dayN_P_230000..
> >
> > Here "P" denotes the period of interest, and "N" the corresponding
days
> > within that period
> >
> >  I have included the "fcst_valid_hour H1 .... fcst_valid_hour H2
....
> > fcst_valid_hour Hn" flag (where "n" is the final hour in the
interval of
> > interest) into the configuration file of stat_analysis, and so far
I have
> > obtained some results. This allowed to extract the required hours
for all
> > days in that period. Part of the objective is to also obtain the
> > performance by station, so that these flags are in combination
with the
> > "-by" flag.
> >
> > One thing to note is that this analysis is conjunction with the
one for
> > the entire period. That is, I will be doing it as a separate one.
> >
> >
> > My questions are:
> >
> > 1) Nevertheless I have some results, I want to know if this
procedure is
> > correct so that am not overlooking something that could be
important
> > regarding the use of the configuration flags. I read some posts
but do
> not
> > have it so clear at all.
> >
> >
> > 2) I want to know if the following approach is valid: Instead of
using
> the
> > "fcst_valid_hour" flag, I am thinking in using a combination of
wild
> > cards/globbing to work directly in the required files for
processing.
> That
> > is: after the required combination of commands, to end up with
something
> > like (the notation is similar as above (P, period; D, day; N, #
days; n,
> #
> > hrs)):
> >
> >     point_stat..L_YYYY_P_D1_H1....stat
> >     point_stat..L_YYYY_P_D1_H2....stat
> >     point_stat..L_YYYY_P_D1_Hn....stat
> >      .
> >     point_stat..L_YYYY_P_DN_H1....stat
> >     point_stat..L_YYYY_P_DN_Hn....stat
> >
> > and then make some links to tell stat_analysis where to find these
files
> > and further  aggregate the stats just for these files
> >
> > Probably it may help in CPU time since it does not have to read
all the
> > files for the original simulation period.
> >
> > Is it equivalent/valid ?
> >
> >
> >
> > THANK you very much for your time and help
> >
> > Best
> >
> > Victor
> >
>

------------------------------------------------


More information about the Met_help mailing list