[Met_help] [rt.rap.ucar.edu #38708] History for Applying thresholds in MET

Fri Mar 18 11:15:47 MDT 2011

----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

John,

There's one other option I wanted to mention to you - can't remember if I've mentioned this before or not.  In the config file, you can specify a verification masking region as a data mask - i.e. a
field of data and a threshold.  The typical example is specifying a topographical field and only looking at points greater than some elevation. But you could also specify the wind speed field that
you're verifying and a threshold value.  Only those points meeting that threshold criteria will be included in the verification "masking" area. That'd allow you to condition your evaluation on the
forecast values.

This actually is the first time this issue of filtering matched pairs has been raised.  Tressa Fowler (tressa at ucar.edu), the project lead for MET, and I were talked this afternoon about some options.
We do realize that the tools we have for verification of winds are less than ideal, but we've had a hard time coming up with a satisfactory approach.  I know Tressa would be happy to talk to you
about it and get a better sense of what evaluation methods would be most useful for the wind energy community.

At this point, this step of filtering matched pairs before computing continuous statistics is not on our list for the next release.  But we'll give it some more thought.  It certainly seems like a
reasonable thing to do.  But we'd need to hammer out the scientific and implementation details before adding it.

Thanks,
John

John Henderson wrote:
Thanks John.

I'm playing with this right now. It seems to be working okay.

BTW, are there plans to implement proper thresholding in the next
release? That probably would be useful (especially for me!).

Thanks, and let me know if you have any questions regarding the madis
code we supplied.

Cheers,

John

John Halley Gotway wrote:
John,

Sorry for the delay in getting back to you.

Mathematically, it is possible to use the VL1L2 lines to compute a
wind speed.  However, for each VL1L2 line, you'd only get one wind
speed value.  Suppose for example that each time your run
Point-Stat, you verify over the continental U.S.  For each run, you
get 1 VL1L2 line that could be used to compute an "average" wind speed
and direction.  There's a lot of cancellation going on there.
If the winds are blowing very strongly in one direction at half the
grid points, and just as strongly in the opposite direction at the
other grid points, the wind speed derived from the VL1L2 line
would be zero.

It sounds like you want to be able to filter the fcst-obs matched
pairs and see how the performance changes as the wind speed gets larger.

Here's one way you could do that:
- Each time you run Point-Stat, dump out the matched pair (MPR) lines
for wind speed.
- Use the STAT-Analysis tool to aggregate those MPR lines (across 1
case or many cases) and use the "-column_min" option to set your
filtering criteria.

For example, try something like the following:
stat_analysis -lookin sample_output.stat -job aggregate_stat -fcst_var
WIND -line_type MPR -out_line_type CNT -column_min OBS 5.0

This job will read a bunch of matched pairs for wind speed, keep only
those where the "OBS" column is at least 5, and then compute
continuous statistics over the matched pairs that remain.  If you
also add "-column_min FCST 5.0" that would require both the forecast
and observation values to be at least 5 m/s to be used.

Hope that helps.

John

John Henderson wrote:

John,

My mistake - I read your option 2 as using the VL1L2 output and not
SL1L2. However, if thresholds are not applied to SL1L2, I still do not
see how I am to aggregate thresholded winds in order to compute ME/RMSE
for thresholded wind speeds using SL1L2.  That is the main goal of this
effort of mine.

Is it possible to use the truly thresholded VL1L2 values?

John

John Halley Gotway wrote:

John,

The SL1L2 partial sums for wind speed should have no thresholds
applied to them.  I'd expect the FCST_THRESH and OBS_THRESH columns to
contain NA in the SL1L2 lines for wind speed.

I ran some sample data using the following configuration...
model = "WRF";
beg_ds = -5400;
end_ds =  5400;
fcst_field[] = [ "UGRD/Z10", "VGRD/Z10", "WIND/Z10" ];
obs_field[]  = [];
fcst_thresh[] = [ "ge5", "ge5", "ge5" ];
obs_thresh[]  = [];
fcst_wind_thresh[] = [ "gt3.0", "gt4.0", "ge5.0", "ge6.0", "ge7.0" ];
obs_wind_thresh[]  = [];
message_type[] = [ "ADPSFC" ];
mask_grid[] = [ "FULL" ];
mask_poly[] = [];
mask_sid = "";
ci_alpha[] = [ 0.05 ];
boot_interval = 1;
boot_rep_prop = 1.0;
n_boot_rep = 1000;
boot_rng = "mt19937";
boot_seed = "";
interp_method[] = [ "DW_MEAN" ];
interp_width[] = [ 2 ];
interp_thresh = 1.0;
output_flag[] = [ 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 2 ];
rank_corr_flag = 0;
grib_ptv = 2;
tmp_dir = "/tmp";
output_prefix = "";
version = "V3.0";

And here's a description of the STAT output I see...
- 1 CNT line for UGRD.
- 1 CNT line for VGRD.
- 5 VL1L2 lines, one for each of the 5 entries in fcst_wind_thresh.
fcst_wind_thresh only applies in the computation of the VL1L2 vector
partial sums.
- 1 CNT line for WIND speed.
- 1 SL1L2 line for WIND speed - there's only one output line here
because no thresholding is applied to the raw wind speeds.
- A bunch of MPR lines.

One other suggestion.  Once you have a bunch of SL1L2 partial sums for
wind speed, and you want to verify them using STAT-Analysis, try
running the following type of job:

./stat_analysis -lookin /directory/with/stat/output -job
aggregate_stat -line_type SL1L2 -out_line_type CNT -dump_row test.stat

I strongly recommend using the "-dump_row" option when you're setting
up STAT-Analysis jobs.  STAT-Analysis will dump all the SL1L2 lines
used in the computation to that output file.  Then you can
review that output file and make sure that it's operating on exactly
the set of data you intended.  Once you're confident that the job is
doing what you'd like, you can omit the "-dump_row" option.

John

John Henderson wrote:

Hi again John,

Thanks again for the notes. I like the flexibility of WRF-MET but I
think the multiple options to essentially carry out the same thing
are what's causing my consternation.

I think I'll tackle your option 2. One clarification: I believe that
Stat Analysis will be able to discriminate - via the fcst_thresh
filter - among the truly thresholded rows in the VL1L2 files?

John

John Halley Gotway wrote:

John,

Answers are inline.

John Henderson wrote:

Hi again John,

Thanks for the clarification. Yes, the code runs to completion -
also
after I change Z2 to Z010!

As expected in my CNT file, I get UGRD, VGRD and WIND verified
separately without thresholding. You mentioned that I still am
able to
verify wind speed WITH thresholding by applying the fcst_wind_thresh
entries. This output goes to the VL1L2 partial sums file, of course.
However, in that file - for each threshold - I have various
UGRD_VGRD
related quantities, such as VFBAR, UOBAR, VOBAR, UVFOBAR, UVFFBAR
and
UVOOBAR.

Two questions:

1) To compute thresholded wind speed stats, did you intend for me to
work with the partial sums?

Wind speed can be treated simply as a scalar field.  You have
several options of how to work with it.  Here are some options:
(1) Each time you run Point-Stat, you can compute statistics for
wind speed over your verification regions.  For example, you can
look at RMSE (CNT line type) for each time, or you could threshold it
using fsct_thresh and obs_thresh and look at contingency table
statistics (CTS line type).
(2) You could verify wind speed for each time and dump out SL1L2
partial sums.  Then you could use the STAT-Analysis tool to
aggregate those partial sums through time and recompute continuous
statistics like RMSE.
(3) You could verify wind speed for each time and dump out MPR,
matched pair line.  Then you could use the STAT-Analysis tool to
aggregate those matched pairs and recompute whatever type of
statistics
you'd like, continuous or categorical.  This gives you the most
flexibility, but the drawback is storing the large amount of MPR
output.

2) Since I have wind speed (grib code=32) already available in
both my
forecast (my modification to WPP) and obs files (via PB2NC), would I
therefore need WIND as the only fcst_field or, equivalently, 32 - or
does the string "WIND" force computation of wind speed using the
therefore required preceding UGRD, VGRD?

"WIND" is the abbreviation used for GRIB code 32.  Selecting
"WIND/Z10" or "32/Z10" in the config file has the same effect.
Point-Stat will search your input GRIB file for a record of wind
speed.  If
it finds one, it will use it (as in your case).  If it doesn't find
a record for wind speed, it will look for U and V, and derive wind
speed.  But since you're input files already have wind speed, it
will just use those records.

Hope that helps.

John

Thanks again.

John

One more piece of information.  If you want to apply multiple
thresholds for the wind speed, you can do so as follows:
  fcst_wind_thresh[] = [ "gt2.0", "ge3.0" ];
Each threshold need to be in quotes, and they should be
separated by
commas.

Also, I told you that in MET we use thresholds to define
contingency
tables, and that is true of the fcst_thresh and obs_thresh
parameters.  However, the fcst_wind_thresh does behave
differently -
it's really like a filtering method to filter out only those
matched
pairs to use in the VL1L2 computations.  Perhaps we should have
named
it fcst_wind_filter instead to avoid the confusion.
Ah yes, my mistake.  That's what I get for writing an email too
late
at night.  All I'm trying to illustrate with these settings is that
the number of fcst_wind_thresh entries does NOT need to match
the number of fcst_field entries.  But you're right that the
number of
fcst_thresh entries needs to match the number of fcst_field
entries. Clear as mud?

fcst_field[]       = [ "UGRD/Z2", "VGRD/Z2", "WIND/Z2" ];
obs_field[]        = [];
fcst_thresh[]      = [ "gt5.0", "gt5.0", "gt5.0" ];
obs_thresh[]       = [ "gt5.0", "gt5.0", "gt5.0" ];
fcst_wind_thresh[] = [ "gt3.0", "gt4.0", "ge5.0", "ge6.0",
"ge7.0" ];
obs_wind_thresh[]  = [];
message_type[]     = [ "ADPSFC" ];

John

John Henderson wrote:

John,

As expected, your settings give the following error:

ERROR: PointStatConfInfo::process_config() -> The number
fcst_thresh
levels provided must match the number of fields provided in
fcst_field.

I always need the same number of thresholds in fcst_thresh as
there
are fields in fcst_fields.

John

John Halley Gotway wrote:

John,

Actually that's not quite correct.  While it is true that the
number of
fields to be verified in fcst_field must match the number of
sets of
thresholds in fcst_thresh.  However, they do not need to match
the
number
of thresholds in fcst_wind_thresh.  Whatever thresholds you
specify in
fcst_wind_thresh will be applied in the computation of all vector
partial
sums.

Also, regarding wind speed, you can verify it directly in
Point-Stat
just
as you would any other field of scalars.  Just select "WIND/Z2"
in the
config file.  If wind speed is not present in the input forecast
file,
Point-Stat will derive it from the U and V components.  However,
observations of wind speed will need to be present in the input
point
observation file - and Point-Stat will not derive them.  If
you're
using
PREPBUFR observation files, the PB2NC will derive observations
of wind
speed for you from the U and V components.

Please try running Point-Stat with the following settings:
fcst_field[]       = [ "UGRD/Z2", "VGRD/Z2", "WIND/Z2" ];
obs_field[]        = [];
fcst_thresh[]      = [ "gt5.0", "gt5.0" ];
obs_thresh[]       = [ "gt5.0", "gt5.0" ];
fcst_wind_thresh[] = [ "gt3.0", "gt4.0", "ge5.0" ];
obs_wind_thresh[]  = [];
message_type[]     = [ "ADPSFC" ];

Also, you're right that when users don't request contingency
table
output
there really is no need to specify thresholds in fcst_thresh.
We're
working on the next release right now, and I'll take a look next
week to
see if I can remove that requirement.

Just let me know if more questions come up.

Thanks,
John

Hi John,

Thanks for the this nugget of an email. I was otherwise partway
through
my first email response, but this answers some of my
questions... For
clarification (please see embedded):

John Halley Gotway wrote:

John,

One more piece of information.  If you want to apply multiple
thresholds
for the wind speed, you can do so as follows:
  fcst_wind_thresh[] = [ "gt2.0", "ge3.0" ];
Each threshold need to be in quotes, and they should be
separated by
commas.

For this case, I would have to duplicate the pair of 33,34 grib
codes in
fcst_field. Is that right? I believe what led me to
misunderstand the
overall thresholding concept was that - months ago - while
learning
MET
initially, I discovered that every fcst_field grib code has
to be
paired
up with a threshold value, even for computing the simplest of
statistics, CNT. That was somewhat surprising, but made me
suspect
that
thresholding was always applied. I typically use an unphysical
number to
prevent actual thresholding from happening.  I believe that the
comments
in this section of the config file do not mention that the
values that
are required for each field will only be applied for specific
line
types.

Also, I told you that in MET we use thresholds to define
contingency
tables, and that is true of the fcst_thresh and obs_thresh
parameters.
However, the fcst_wind_thresh does behave differently -
it's really like a filtering method to filter out only those
matched
pairs to use in the VL1L2 computations.  Perhaps we should have
named it
fcst_wind_filter instead to avoid the confusion.

This is good, since I do need to threshold based on wind speed.
However,
it seems to me that the VL1L2 partial sums are only used to
compute
wind
direction. It's not obvious to me how to retrieve wind speed
from
these
sums. WDIR is an allowed out_line_type in Stat Analysis...(As I
mentioned, I already have wind speed values computed in both obs
and WRF
files). Please advise!

I do realize that configuring Point-Stat (and the other MET
tools)
to do
exactly what you want can be tricky and frustrating.  There
are a
lot of
details and options packed in there.

Hope that helps and have a good weekend.

John

John Halley Gotway wrote:

John

----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------