[Met_help] grid_stat question on neighborhood verification

John Halley Gotway johnhg at rap.ucar.edu
Mon Mar 16 10:45:19 MDT 2009


Jonathan,

I finally found the bug.  It's a one-line fix to the routine that computes the fractional coverage field:

In the file, "METv2.0beta?/lib/vx_wrfdata/src/vx_wrfdata.cc", look for the routine named "fractional_coverage()".  Searching down from there, look for the following line:
   v = wd.get_xy_double(x, y);

In that line, replace the x with xx and the y with yy.  So the line should now read:
   v = wd.get_xy_double(xx, yy);

Please let me know how things look after this fix.  Thanks for finding this bug.

John

John Halley Gotway wrote:
> Jonathan,
> 
> You've found a bug.  Thanks for letting me know.
> 
> I'm still looking into it.  But at this point, it appears that there's a problem in the computation of the fractional coverage field.  Rather than containing data values between 0 and 1, they're are
> all set to exactly 0 or 1.  So all thresholds greater than 0 would produce the same results.  I'll work on the fractional coverage field computations and let you know when I have a fix.
> 
> Thanks again!
> 
> John
> 
> Case, Jonathan (MSFC-VP61)[Other] wrote:
>> John,
>>
>> I now have grid_stat running with the following:
>>
>> fcst_thresh[] = [ "ge5 ge10 ge25" ];
>> obs_thresh[]  = [ "ge5 ge10 ge25" ];
>> nbr_width[] = [ 5, 13, 21 ];
>> nbr_threshold = 1.0;
>> nbr_frac_threshold[] = [ "gt0.0", "ge0.25", "ge0.50" ];
>>
>> So far, I'm not seeing ANY differences in the various nbrcts stats when looking at the different values of nbr_frac_threshold/COV_THRESH (i.e. >0.000, 0=0.250, and >=0.500 yield the same results).  I have examined several different forecast times so far.  
>> The only differences occur at the different thresholds and neighborhood boxes.  
>> Am I still doing something wrong?  
>>
>> I did look at the nbrcnt stats, but those only depend on the raw thresholds and neighborhood boxes.  The output from those stats do make sense to me.  I'm seeing higher skill at lower thresholds with larger neighborhood boxes.  
>>
>> Regards,
>> Jonathan
>>
>>
>>> -----Original Message-----
>>> From: John Halley Gotway [mailto:johnhg at rap.ucar.edu]
>>> Sent: Monday, March 16, 2009 9:20 AM
>>> To: Case, Jonathan (MSFC-VP61)[Other]
>>> Cc: met_help at ucar.edu
>>> Subject: Re: grid_stat question on neighborhood verification
>>>
>>> Jonathan,
>>>
>>> First, a note about the speed.  Since you're running a beta version of
>>> METv2.0, there was a performance issue that we have since discovered
>>> and fixed.  When you begin using the released version of
>>> METv2.0, you'll find that it runs much faster.  We're using some new
>>> classes for generating the output ASCII files.  They're supposed to do
>>> a bit of book-keeping to figure out columns widths and
>>> formatting.  However, we realized that instead of doing a "little bit"
>>> of book-keeping, they were doing way too much of it!  With that fix, it
>>> runs much quicker.
>>>
>>> Also, as you noted, turning off the correlation coefficients speeds it
>>> up, and setting the "n_boot_rep" to 0 to turn off bootstrapping speeds
>>> it up.
>>>
>>> As for neighborhood methods, here's how it works:
>>> (1) You define the raw thresholds values in which you're interested
>>> using the "fcst_thresh" and "obs_thresh" parameters.
>>> (2) You define the neighborhood sizes of interest using the "nbr_width"
>>> parameter.
>>> (3) For each combination of raw threshold and neighborhood size, a
>>> fractional coverage field is computed.  For example, the threshold
>>> ">=5.0" and a neighborhood size of 5.  For each grid point in the
>>> forecast field, the raw value at that grid point is replaced by a
>>> fractional coverage value as follows.  A 5-by-5 box is drawn around
>>> current grid point, and we calculate the number of those 25 points
>>> that have a value >=5.0.  Suppose 10 of them do, and the fractional
>>> coverage value for that point is defined to be 10/25, or 0.4.  The same
>>> process is done in the observation field to compute a
>>> fractional coverage field.
>>> (4) The "nbr_threshold" is used in the computation of the fractional
>>> coverage fields to decide what to do with bad data values.  This
>>> determines the percentage of points that need to be valid in order
>>> for a fractional coverage value to be computed.  Since it's set to 1,
>>> or 100%, all 25 of the neighborhood points have to be valid for a valid
>>> fractional coverage value to be computed.
>>> (5) Now we have a fractional coverage field for the forecast and
>>> observation.  Those two fields can be compared directly to compute
>>> scores like the Fractions Brier Score and Fractions Skill Score in
>>> the NBRCNT output line.
>>> (6) Alternatively, you could threshold the fractional coverage fields
>>> to compute the NBRCTC and NBRCTS output lines.  We use the
>>> "nbr_frac_threshold" parameter to determine which thresholds between 0
>>> and 1 you'd like to apply to those fields.  In you're case, you've
>>> chosen >=0.5.
>>>
>>> I'd suggest rerunning this case, but try using multiple values for
>>> nbr_frac_threshold, like "gt0.0 ge0.25 ge0.50 ge0.75".  And then see
>>> how the results change.  When doing this much processing on the
>>> fields, the interpretation can get a bit confusing.  But for example,
>>> for a raw threshold >=5.0mm, neighborhood size of 5-by-5, and
>>> neighborhood threshold of >0.0... you're really asking a question
>>> like "When I forecast precip of >=5.0mm somewhere nearby (within 25
>>> grid points), does precip >=5.0mm actually occur anywhere nearby
>>> (within 25 grid points)?".  Also, you may want to read up about the
>>> Fractions Skill Score and interpretations of that.
>>>
>>> For more information on methods and statistics, and for interpretation
>>> of results, let me refer you to Tressa Fowler, tressa at ucar.edu, the
>>> statistician who's leading the development of MET.
>>>
>>> Good luck,
>>> John
>>>
>>> Case, Jonathan (MSFC-VP61)[Other] wrote:
>>>> Hello John,
>>>>
>>>>
>>>>
>>>> I must not be applying the neighborhood verification method properly
>>> because of the preliminary numbers I'm seeing.
>>>> I've looked over a few different sets of output, and so far the
>>> neighborhood verification numbers are nearly the same as the standard
>>> CTS numbers for various thresholds of precipitation.   I was under the
>>> impression that by applying a neighborhood box that the stringency of
>>> the verification would be relaxed so that the categorical and skill
>>> scores would be improved.  However, that doesn't seem to be the case.
>>>>
>>>> Here is what I applied for my 4-km grid results: (for 3-hour
>>> accumulated precipitation)
>>>>
>>>> ·         fcst_thresh[] = [ "ge5 ge10 ge25" ];
>>>>
>>>> ·         obs_thresh[]  = [ "ge5 ge10 ge25" ];
>>>>
>>>> ·         nbr_width[] = [ 5, 13 ];  à corresponding to ~20km and 50km
>>> neighborhood "boxes"
>>>> ·         nbr_threshold = 1.0;
>>>>
>>>> ·         nbr_frac_threshold[] = [ "ge0.5" ];
>>>>
>>>>
>>>>
>>>> I'm wondering whether the nbr_frac_threshold needs to be
>>> reduced/relaxed?  I don't fully understand what this parameter does and
>>> how to set it effectively based on the neighborhood width values.
>>>>
>>>> Let's first see if my interpretation is correct.  If I set
>>> nbr_frac_threshold to "ge0.5", does this mean that at least 50% of all
>>> the neighborhood grid points have to meet or exceed the list of
>>> fcst/obs_thresh[] in order for a "hit" to occur?  I was thinking
>>> initially that if ANY grid point in the OBS/FCST paired neighborhood
>>> meets or exceeds the thresholds, then it should be considered a hit.
>>> But that doesn't appear to be the case the way I have configured this
>>> run.
>>>>
>>>> Finally, on a side note, the grid_stat program is running *extremely*
>>> slow, even with the correlation coefficients turned off.  Granted, I am
>>> running on the large STIV grid, but I am masking it based on a WRF.poly
>>> file I created only over the SE U.S., amounting to about a 350x350
>>> grid.  I ran through only half a month's worth of control+experimental
>>> forecasts over this past weekend.   It takes several minutes to get
>>> through just one set of forecast/observation calculations at a single
>>> output time.  Is there a way to optimize the grid_stat program or is
>>> this program known to run very slowly based on the # of computations
>>> being made.
>>>>
>>>> Thanks again for the help,
>>>>
>>>> Jonathan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ***********************************************************
>>>> Jonathan Case, ENSCO, Inc.
>>>> Aerospace Sciences & Engineering Division
>>>> Short-term Prediction Research and Transition Center
>>>> 320 Sparkman Drive, Room 3062
>>>> Huntsville, AL 35805-1912
>>>> Voice: (256) 961-7504   Fax: (256) 961-7788
>>>> Emails: Jonathan.Case-1 at nasa.gov
>>>>
>>>>              case.jonathan at ensco.com
>>>>
>>>> ***********************************************************
>>>>
>>>>
>>>>
>>>>
> _______________________________________________
> Met_help mailing list
> Met_help at mailman.ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/met_help


More information about the Met_help mailing list