[Met_help] grid_stat question on neighborhood verification

Mon Mar 16 12:51:07 MDT 2009

Jonathan,

Great.  Thanks for checking.

John

Case, Jonathan (MSFC-VP61)[Other] wrote:
> John,
> 
> I'm seeing scores now that make perfect sense.  The skill scores increase with neighborhood size, and are highest at the COV_THRESH >= 0.0
> 
> Thanks for correcting that bug!
> Jonathan
> 
>> -----Original Message-----
>> From: John Halley Gotway [mailto:johnhg at rap.ucar.edu]
>> Sent: Monday, March 16, 2009 11:45 AM
>> To: Case, Jonathan (MSFC-VP61)[Other]
>> Cc: met_help at ucar.edu
>> Subject: Re: [Met_help] grid_stat question on neighborhood verification
>>
>> Jonathan,
>>
>> I finally found the bug.  It's a one-line fix to the routine that
>> computes the fractional coverage field:
>>
>> In the file, "METv2.0beta?/lib/vx_wrfdata/src/vx_wrfdata.cc", look for
>> the routine named "fractional_coverage()".  Searching down from there,
>> look for the following line:
>>    v = wd.get_xy_double(x, y);
>>
>> In that line, replace the x with xx and the y with yy.  So the line
>> should now read:
>>    v = wd.get_xy_double(xx, yy);
>>
>> Please let me know how things look after this fix.  Thanks for finding
>> this bug.
>>
>> John
>>
>> John Halley Gotway wrote:
>>> Jonathan,
>>>
>>> You've found a bug.  Thanks for letting me know.
>>>
>>> I'm still looking into it.  But at this point, it appears that
>> there's a problem in the computation of the fractional coverage field.
>> Rather than containing data values between 0 and 1, they're are
>>> all set to exactly 0 or 1.  So all thresholds greater than 0 would
>> produce the same results.  I'll work on the fractional coverage field
>> computations and let you know when I have a fix.
>>> Thanks again!
>>>
>>> John
>>>
>>> Case, Jonathan (MSFC-VP61)[Other] wrote:
>>>> John,
>>>>
>>>> I now have grid_stat running with the following:
>>>>
>>>> fcst_thresh[] = [ "ge5 ge10 ge25" ];
>>>> obs_thresh[]  = [ "ge5 ge10 ge25" ];
>>>> nbr_width[] = [ 5, 13, 21 ];
>>>> nbr_threshold = 1.0;
>>>> nbr_frac_threshold[] = [ "gt0.0", "ge0.25", "ge0.50" ];
>>>>
>>>> So far, I'm not seeing ANY differences in the various nbrcts stats
>> when looking at the different values of nbr_frac_threshold/COV_THRESH
>> (i.e. >0.000, 0=0.250, and >=0.500 yield the same results).  I have
>> examined several different forecast times so far.
>>>> The only differences occur at the different thresholds and
>> neighborhood boxes.
>>>> Am I still doing something wrong?
>>>>
>>>> I did look at the nbrcnt stats, but those only depend on the raw
>> thresholds and neighborhood boxes.  The output from those stats do make
>> sense to me.  I'm seeing higher skill at lower thresholds with larger
>> neighborhood boxes.
>>>> Regards,
>>>> Jonathan
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: John Halley Gotway [mailto:johnhg at rap.ucar.edu]
>>>>> Sent: Monday, March 16, 2009 9:20 AM
>>>>> To: Case, Jonathan (MSFC-VP61)[Other]
>>>>> Cc: met_help at ucar.edu
>>>>> Subject: Re: grid_stat question on neighborhood verification
>>>>>
>>>>> Jonathan,
>>>>>
>>>>> First, a note about the speed.  Since you're running a beta version
>> of
>>>>> METv2.0, there was a performance issue that we have since
>> discovered
>>>>> and fixed.  When you begin using the released version of
>>>>> METv2.0, you'll find that it runs much faster.  We're using some
>> new
>>>>> classes for generating the output ASCII files.  They're supposed to
>> do
>>>>> a bit of book-keeping to figure out columns widths and
>>>>> formatting.  However, we realized that instead of doing a "little
>> bit"
>>>>> of book-keeping, they were doing way too much of it!  With that
>> fix, it
>>>>> runs much quicker.
>>>>>
>>>>> Also, as you noted, turning off the correlation coefficients speeds
>> it
>>>>> up, and setting the "n_boot_rep" to 0 to turn off bootstrapping
>> speeds
>>>>> it up.
>>>>>
>>>>> As for neighborhood methods, here's how it works:
>>>>> (1) You define the raw thresholds values in which you're interested
>>>>> using the "fcst_thresh" and "obs_thresh" parameters.
>>>>> (2) You define the neighborhood sizes of interest using the
>> "nbr_width"
>>>>> parameter.
>>>>> (3) For each combination of raw threshold and neighborhood size, a
>>>>> fractional coverage field is computed.  For example, the threshold
>>>>> ">=5.0" and a neighborhood size of 5.  For each grid point in the
>>>>> forecast field, the raw value at that grid point is replaced by a
>>>>> fractional coverage value as follows.  A 5-by-5 box is drawn around
>>>>> current grid point, and we calculate the number of those 25 points
>>>>> that have a value >=5.0.  Suppose 10 of them do, and the fractional
>>>>> coverage value for that point is defined to be 10/25, or 0.4.  The
>> same
>>>>> process is done in the observation field to compute a
>>>>> fractional coverage field.
>>>>> (4) The "nbr_threshold" is used in the computation of the
>> fractional
>>>>> coverage fields to decide what to do with bad data values.  This
>>>>> determines the percentage of points that need to be valid in order
>>>>> for a fractional coverage value to be computed.  Since it's set to
>> 1,
>>>>> or 100%, all 25 of the neighborhood points have to be valid for a
>> valid
>>>>> fractional coverage value to be computed.
>>>>> (5) Now we have a fractional coverage field for the forecast and
>>>>> observation.  Those two fields can be compared directly to compute
>>>>> scores like the Fractions Brier Score and Fractions Skill Score in
>>>>> the NBRCNT output line.
>>>>> (6) Alternatively, you could threshold the fractional coverage
>> fields
>>>>> to compute the NBRCTC and NBRCTS output lines.  We use the
>>>>> "nbr_frac_threshold" parameter to determine which thresholds
>> between 0
>>>>> and 1 you'd like to apply to those fields.  In you're case, you've
>>>>> chosen >=0.5.
>>>>>
>>>>> I'd suggest rerunning this case, but try using multiple values for
>>>>> nbr_frac_threshold, like "gt0.0 ge0.25 ge0.50 ge0.75".  And then
>> see
>>>>> how the results change.  When doing this much processing on the
>>>>> fields, the interpretation can get a bit confusing.  But for
>> example,
>>>>> for a raw threshold >=5.0mm, neighborhood size of 5-by-5, and
>>>>> neighborhood threshold of >0.0... you're really asking a question
>>>>> like "When I forecast precip of >=5.0mm somewhere nearby (within 25
>>>>> grid points), does precip >=5.0mm actually occur anywhere nearby
>>>>> (within 25 grid points)?".  Also, you may want to read up about the
>>>>> Fractions Skill Score and interpretations of that.
>>>>>
>>>>> For more information on methods and statistics, and for
>> interpretation
>>>>> of results, let me refer you to Tressa Fowler, tressa at ucar.edu, the
>>>>> statistician who's leading the development of MET.
>>>>>
>>>>> Good luck,
>>>>> John
>>>>>
>>>>> Case, Jonathan (MSFC-VP61)[Other] wrote:
>>>>>> Hello John,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I must not be applying the neighborhood verification method
>> properly
>>>>> because of the preliminary numbers I'm seeing.
>>>>>> I've looked over a few different sets of output, and so far the
>>>>> neighborhood verification numbers are nearly the same as the
>> standard
>>>>> CTS numbers for various thresholds of precipitation.   I was under
>> the
>>>>> impression that by applying a neighborhood box that the stringency
>> of
>>>>> the verification would be relaxed so that the categorical and skill
>>>>> scores would be improved.  However, that doesn't seem to be the
>> case.
>>>>>> Here is what I applied for my 4-km grid results: (for 3-hour
>>>>> accumulated precipitation)
>>>>>> ·         fcst_thresh[] = [ "ge5 ge10 ge25" ];
>>>>>>
>>>>>> ·         obs_thresh[]  = [ "ge5 ge10 ge25" ];
>>>>>>
>>>>>> ·         nbr_width[] = [ 5, 13 ];  à corresponding to ~20km and
>> 50km
>>>>> neighborhood "boxes"
>>>>>> ·         nbr_threshold = 1.0;
>>>>>>
>>>>>> ·         nbr_frac_threshold[] = [ "ge0.5" ];
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm wondering whether the nbr_frac_threshold needs to be
>>>>> reduced/relaxed?  I don't fully understand what this parameter does
>> and
>>>>> how to set it effectively based on the neighborhood width values.
>>>>>> Let's first see if my interpretation is correct.  If I set
>>>>> nbr_frac_threshold to "ge0.5", does this mean that at least 50% of
>> all
>>>>> the neighborhood grid points have to meet or exceed the list of
>>>>> fcst/obs_thresh[] in order for a "hit" to occur?  I was thinking
>>>>> initially that if ANY grid point in the OBS/FCST paired
>> neighborhood
>>>>> meets or exceeds the thresholds, then it should be considered a
>> hit.
>>>>> But that doesn't appear to be the case the way I have configured
>> this
>>>>> run.
>>>>>> Finally, on a side note, the grid_stat program is running
>> *extremely*
>>>>> slow, even with the correlation coefficients turned off.  Granted,
>> I am
>>>>> running on the large STIV grid, but I am masking it based on a
>> WRF.poly
>>>>> file I created only over the SE U.S., amounting to about a 350x350
>>>>> grid.  I ran through only half a month's worth of
>> control+experimental
>>>>> forecasts over this past weekend.   It takes several minutes to get
>>>>> through just one set of forecast/observation calculations at a
>> single
>>>>> output time.  Is there a way to optimize the grid_stat program or
>> is
>>>>> this program known to run very slowly based on the # of
>> computations
>>>>> being made.
>>>>>> Thanks again for the help,
>>>>>>
>>>>>> Jonathan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ***********************************************************
>>>>>> Jonathan Case, ENSCO, Inc.
>>>>>> Aerospace Sciences & Engineering Division
>>>>>> Short-term Prediction Research and Transition Center
>>>>>> 320 Sparkman Drive, Room 3062
>>>>>> Huntsville, AL 35805-1912
>>>>>> Voice: (256) 961-7504   Fax: (256) 961-7788
>>>>>> Emails: Jonathan.Case-1 at nasa.gov
>>>>>>
>>>>>>              case.jonathan at ensco.com
>>>>>>
>>>>>> ***********************************************************
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>> _______________________________________________
>>> Met_help mailing list
>>> Met_help at mailman.ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/met_help