[Met_help] grid_stat question on neighborhood verification

Mon Mar 16 08:44:25 MDT 2009

Jonathan,

If you'd like, you can retrieve that latest beta version of METv2.0 from our ftp site:
ftp.rap.ucar.edu/incoming/irap/johnhg/METv2.0beta8.20090302.tar.gz
That contains the fix.  I'd be interested to hear how much it improves your runtime.

I want to mention to you though, the format of some of the config files are going to change slightly for the actual released version of METv2.0.  We're finishing up adding support for verifying
probabilistic forecasts.  And there are some config file additions to support that.  So you should always use the version of the config files that are distributed with the code.

Thanks,
John

Case, Jonathan (MSFC-VP61)[Other] wrote:
> John,
> 
> Thanks for all the info below!  I was wondering whether there is a way to get a hold of the more optimal run-time code changes, or is it too extensive at this point to distribute?  
> 
> Much appreciated,
> Jonathan
> 
> 
>> -----Original Message-----
>> From: John Halley Gotway [mailto:johnhg at rap.ucar.edu]
>> Sent: Monday, March 16, 2009 9:20 AM
>> To: Case, Jonathan (MSFC-VP61)[Other]
>> Cc: met_help at ucar.edu
>> Subject: Re: grid_stat question on neighborhood verification
>>
>> Jonathan,
>>
>> First, a note about the speed.  Since you're running a beta version of
>> METv2.0, there was a performance issue that we have since discovered
>> and fixed.  When you begin using the released version of
>> METv2.0, you'll find that it runs much faster.  We're using some new
>> classes for generating the output ASCII files.  They're supposed to do
>> a bit of book-keeping to figure out columns widths and
>> formatting.  However, we realized that instead of doing a "little bit"
>> of book-keeping, they were doing way too much of it!  With that fix, it
>> runs much quicker.
>>
>> Also, as you noted, turning off the correlation coefficients speeds it
>> up, and setting the "n_boot_rep" to 0 to turn off bootstrapping speeds
>> it up.
>>
>> As for neighborhood methods, here's how it works:
>> (1) You define the raw thresholds values in which you're interested
>> using the "fcst_thresh" and "obs_thresh" parameters.
>> (2) You define the neighborhood sizes of interest using the "nbr_width"
>> parameter.
>> (3) For each combination of raw threshold and neighborhood size, a
>> fractional coverage field is computed.  For example, the threshold
>> ">=5.0" and a neighborhood size of 5.  For each grid point in the
>> forecast field, the raw value at that grid point is replaced by a
>> fractional coverage value as follows.  A 5-by-5 box is drawn around
>> current grid point, and we calculate the number of those 25 points
>> that have a value >=5.0.  Suppose 10 of them do, and the fractional
>> coverage value for that point is defined to be 10/25, or 0.4.  The same
>> process is done in the observation field to compute a
>> fractional coverage field.
>> (4) The "nbr_threshold" is used in the computation of the fractional
>> coverage fields to decide what to do with bad data values.  This
>> determines the percentage of points that need to be valid in order
>> for a fractional coverage value to be computed.  Since it's set to 1,
>> or 100%, all 25 of the neighborhood points have to be valid for a valid
>> fractional coverage value to be computed.
>> (5) Now we have a fractional coverage field for the forecast and
>> observation.  Those two fields can be compared directly to compute
>> scores like the Fractions Brier Score and Fractions Skill Score in
>> the NBRCNT output line.
>> (6) Alternatively, you could threshold the fractional coverage fields
>> to compute the NBRCTC and NBRCTS output lines.  We use the
>> "nbr_frac_threshold" parameter to determine which thresholds between 0
>> and 1 you'd like to apply to those fields.  In you're case, you've
>> chosen >=0.5.
>>
>> I'd suggest rerunning this case, but try using multiple values for
>> nbr_frac_threshold, like "gt0.0 ge0.25 ge0.50 ge0.75".  And then see
>> how the results change.  When doing this much processing on the
>> fields, the interpretation can get a bit confusing.  But for example,
>> for a raw threshold >=5.0mm, neighborhood size of 5-by-5, and
>> neighborhood threshold of >0.0... you're really asking a question
>> like "When I forecast precip of >=5.0mm somewhere nearby (within 25
>> grid points), does precip >=5.0mm actually occur anywhere nearby
>> (within 25 grid points)?".  Also, you may want to read up about the
>> Fractions Skill Score and interpretations of that.
>>
>> For more information on methods and statistics, and for interpretation
>> of results, let me refer you to Tressa Fowler, tressa at ucar.edu, the
>> statistician who's leading the development of MET.
>>
>> Good luck,
>> John
>>
>> Case, Jonathan (MSFC-VP61)[Other] wrote:
>>> Hello John,
>>>
>>>
>>>
>>> I must not be applying the neighborhood verification method properly
>> because of the preliminary numbers I'm seeing.
>>> I've looked over a few different sets of output, and so far the
>> neighborhood verification numbers are nearly the same as the standard
>> CTS numbers for various thresholds of precipitation.   I was under the
>> impression that by applying a neighborhood box that the stringency of
>> the verification would be relaxed so that the categorical and skill
>> scores would be improved.  However, that doesn't seem to be the case.
>>>
>>>
>>> Here is what I applied for my 4-km grid results: (for 3-hour
>> accumulated precipitation)
>>>
>>>
>>> ·         fcst_thresh[] = [ "ge5 ge10 ge25" ];
>>>
>>> ·         obs_thresh[]  = [ "ge5 ge10 ge25" ];
>>>
>>> ·         nbr_width[] = [ 5, 13 ];  à corresponding to ~20km and 50km
>> neighborhood "boxes"
>>> ·         nbr_threshold = 1.0;
>>>
>>> ·         nbr_frac_threshold[] = [ "ge0.5" ];
>>>
>>>
>>>
>>> I'm wondering whether the nbr_frac_threshold needs to be
>> reduced/relaxed?  I don't fully understand what this parameter does and
>> how to set it effectively based on the neighborhood width values.
>>>
>>>
>>> Let's first see if my interpretation is correct.  If I set
>> nbr_frac_threshold to "ge0.5", does this mean that at least 50% of all
>> the neighborhood grid points have to meet or exceed the list of
>> fcst/obs_thresh[] in order for a "hit" to occur?  I was thinking
>> initially that if ANY grid point in the OBS/FCST paired neighborhood
>> meets or exceeds the thresholds, then it should be considered a hit.
>> But that doesn't appear to be the case the way I have configured this
>> run.
>>>
>>>
>>> Finally, on a side note, the grid_stat program is running *extremely*
>> slow, even with the correlation coefficients turned off.  Granted, I am
>> running on the large STIV grid, but I am masking it based on a WRF.poly
>> file I created only over the SE U.S., amounting to about a 350x350
>> grid.  I ran through only half a month's worth of control+experimental
>> forecasts over this past weekend.   It takes several minutes to get
>> through just one set of forecast/observation calculations at a single
>> output time.  Is there a way to optimize the grid_stat program or is
>> this program known to run very slowly based on the # of computations
>> being made.
>>>
>>>
>>> Thanks again for the help,
>>>
>>> Jonathan
>>>
>>>
>>>
>>>
>>>
>>> ***********************************************************
>>> Jonathan Case, ENSCO, Inc.
>>> Aerospace Sciences & Engineering Division
>>> Short-term Prediction Research and Transition Center
>>> 320 Sparkman Drive, Room 3062
>>> Huntsville, AL 35805-1912
>>> Voice: (256) 961-7504   Fax: (256) 961-7788
>>> Emails: Jonathan.Case-1 at nasa.gov
>>>
>>>              case.jonathan at ensco.com
>>>
>>> ***********************************************************
>>>
>>>
>>>
>>>