[Met_help] [rt.rap.ucar.edu #63061] History for further information on MODE output

Thu Sep 19 09:50:39 MDT 2013

----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Dear met_help,

I'm trying to fully understand the outputs MODE creates but I couldn't
find what I'm looking for in the publications (Davis 2006 - part 1 and
part 2).

My question is regarding the ASCII file that contains the contingency
table counts and statistics. In sec 6.3.3. of the manual for v4.0
(http://www.dtcenter.org/met/users/docs/users_guide/MET_Users_Guide_v4.0.1.pdf)
at page 119, it says:
This file consists of 4 lines. The first is a header line containing
column names. The second line contains data comparing the two raw
fields after any masking of bad data or based on a grid or lat/lon
polygon has been applied. The third contains data comparing the two
fields after any raw thresholds have been applied. The fourth, and
last, line contains data comparing the derived object fields scored
using traditional measures.

If I understand correctly, the 2nd and 3rd line do not consider the
objects created by MODE. The 2nd line applies the definitions for (i)
mask_missing_flag and (ii) mask [ grid, grid_flag, poly, poly_flag ]
from the MODE configuration file. The 3rd line includes what has been
applied to line #2 (i, ii) plus the value of (iii) raw_thresh from the
configuration file. The results obtained in both 2nd and 3rd line are
a simple grid to grid (obs to forecast) comparison. In other words,
the gridcell (1,1) from obs file is compared to gridcell (1,1) from
forecast.
 The 4th line is when the objects are considered. But I don't
understand how the counts for the contingency table are made. Is it
simply considering the gridcell values inside the objects, ignoring
the values outside the objects, and comparing the gridcells of obs to
the equivalent gridcells of forecast (like I assumed it was done for
lines 2 and 3)? Or it it shifting the objects to match the centroids
and then comparing the gridcells relative to this new positioning?

Thanks again,

Maria Eugenia B. Frediani
-------------------------------------------------------------------------------------
PhD Candidate
University of Connecticut
School of Engineering
261 Glenbrook Rd
Storrs, CT 06269
maria.frediani at uconn.edu

----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: Re: [rt.rap.ucar.edu #63061] further information on MODE output
From: John Halley Gotway
Time: Wed Sep 18 11:29:51 2013

Maria,

I see that you're asking about the contingency table statistics output
from MODE.  The intention here is to provide some statistics with
which users may be familiar as a point of comparison.  When you
apply a threshold to a field of data in verification, you typically
look at the a 2x2 contingency table and the statistics that can be
derived from such a table, like Probability of Detection (PODY)
and Critical Success Index (CSI) for example.

The file containing these counts and statistics ends with "_cts.txt".
It contains 3 lines of data that are distinguished by the entry in the
"FIELD" column.  That column contains the values "RAW",
"FILTER", and "OBJECT".  In all cases, these 2x2 contingency table are
computed on a grid-point by grid-point basis.  There difference
between them is the field that was processed to compute the
counts and stats.

Your description of the RAW and FILTER lines looks accurate to me.
Generally, these values will be identical unless a "raw_thresh" has
been applied.  The vast majority of the time, this isn't
necessary.  This came up during an evaluation of a convective weather
forecast.  The forecast field contained areas of a 35 dBZ reflectivity
and higher surrounded by a bunch of 0's.  They weren't
forecasting reflectivities below 35 dBZ.  The observations, on the
other hand, contained areas of 35 dBZ surrounded by areas of
reflectivity less than that.  Due the difference in the data, the
smoothing and thresholding operations of MODE resulted in much smaller
forecast objects than observation objects.  Values of 35 surround by
0's yields smaller objects than values of 35 surrounded by
34, 33, 32, and so on.  In this case, we applied the raw_thresh to the
observation field and threw out any values less than 35.  That way,
the object definition process treated the forecast and
observation fields the same.

That's a long explanation for the utility of the raw_thresh option.
Most users will be comparing continuous forecast and observation
fields that are defined in the same way.  If the raw_thresh is not
used the RAW and FILTER CTS lines will be identical.

Lastly, the OBJECT line contains counts that are computed grid-point
by grid-point over the field of resolved objects.  If an object exists
at the point, then the event is occurring.  If not, there's
no event.  The matching/merging of objects has no impact here.  It's
just checking grid-point by grid-point to see if an object exists in
the forecast and observation fields, and deriving a 2x2
contingency table in the process.

Now, to confuse you further :) there's another way some people have
analyzed the output of MODE.  Take a look at this page for the
mode_summary.R script:
    http://www.dtcenter.org/met/users/downloads/analysis_scripts.php

This script reads through many MODE output files and sums up the
counts and areas of matched and unmatched objects in the output.  One
could choose to define a 2x2 contingency table from the counts or
areas of these matched/unmatched objects.  It's straight-forward to
pick out hits ((matched forecast objects + matched observation
objects)/2), misses (unmatched observation objects), and false alarms
(unmatched forecast objects).  But it's more difficult to pick the 4th
cell of the contingency table, correct negatives.

Object-based verification is still relatively new, and while it
provides you with a wealth of diagnostic information, I think there's
still work to be done in analyzing and understanding it's output.

Hope that helps.

Thanks,
John Halley Gotway
met_help at ucar.edu

On 09/18/2013 06:35 AM, Maria Eugenia via RT wrote:
>
> Wed Sep 18 06:35:42 2013: Request 63061 was acted upon.
> Transaction: Ticket created by maria.frediani at gmail.com
>         Queue: met_help
>       Subject: further information on MODE output
>         Owner: Nobody
>    Requestors: maria.frediani at gmail.com
>        Status: new
>   Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=63061 >
>
>
> Dear met_help,
>
> I'm trying to fully understand the outputs MODE creates but I
couldn't
> find what I'm looking for in the publications (Davis 2006 - part 1
and
> part 2).
>
> My question is regarding the ASCII file that contains the
contingency
> table counts and statistics. In sec 6.3.3. of the manual for v4.0
>
(http://www.dtcenter.org/met/users/docs/users_guide/MET_Users_Guide_v4.0.1.pdf)
> at page 119, it says:
> This file consists of 4 lines. The first is a header line containing
> column names. The second line contains data comparing the two raw
> fields after any masking of bad data or based on a grid or lat/lon
> polygon has been applied. The third contains data comparing the two
> fields after any raw thresholds have been applied. The fourth, and
> last, line contains data comparing the derived object fields scored
> using traditional measures.
>
> If I understand correctly, the 2nd and 3rd line do not consider the
> objects created by MODE. The 2nd line applies the definitions for
(i)
> mask_missing_flag and (ii) mask [ grid, grid_flag, poly, poly_flag ]
> from the MODE configuration file. The 3rd line includes what has
been
> applied to line #2 (i, ii) plus the value of (iii) raw_thresh from
the
> configuration file. The results obtained in both 2nd and 3rd line
are
> a simple grid to grid (obs to forecast) comparison. In other words,
> the gridcell (1,1) from obs file is compared to gridcell (1,1) from
> forecast.
>   The 4th line is when the objects are considered. But I don't
> understand how the counts for the contingency table are made. Is it
> simply considering the gridcell values inside the objects, ignoring
> the values outside the objects, and comparing the gridcells of obs
to
> the equivalent gridcells of forecast (like I assumed it was done for
> lines 2 and 3)? Or it it shifting the objects to match the centroids
> and then comparing the gridcells relative to this new positioning?
>
> Thanks again,
>
> Maria Eugenia B. Frediani
>
-------------------------------------------------------------------------------------
> PhD Candidate
> University of Connecticut
> School of Engineering
> 261 Glenbrook Rd
> Storrs, CT 06269
> maria.frediani at uconn.edu
>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #63061] further information on MODE output
From: Maria Eugenia
Time: Thu Sep 19 06:56:12 2013

Hi John,

Thanks a lot for all your explanation. It's all clear to me now.

Best,
Maria
Maria Eugenia B. Frediani
-------------------------------------------------------------------------------------
PhD Candidate
University of Connecticut
School of Engineering
261 Glenbrook Rd
Storrs, CT 06269
maria.frediani at uconn.edu

On Wed, Sep 18, 2013 at 1:29 PM, John Halley Gotway via RT
<met_help at ucar.edu> wrote:
> Maria,
>
> I see that you're asking about the contingency table statistics
output from MODE.  The intention here is to provide some statistics
with which users may be familiar as a point of comparison.  When you
> apply a threshold to a field of data in verification, you typically
look at the a 2x2 contingency table and the statistics that can be
derived from such a table, like Probability of Detection (PODY)
> and Critical Success Index (CSI) for example.
>
> The file containing these counts and statistics ends with
"_cts.txt".  It contains 3 lines of data that are distinguished by the
entry in the "FIELD" column.  That column contains the values "RAW",
> "FILTER", and "OBJECT".  In all cases, these 2x2 contingency table
are computed on a grid-point by grid-point basis.  There difference
between them is the field that was processed to compute the
> counts and stats.
>
> Your description of the RAW and FILTER lines looks accurate to me.
Generally, these values will be identical unless a "raw_thresh" has
been applied.  The vast majority of the time, this isn't
> necessary.  This came up during an evaluation of a convective
weather forecast.  The forecast field contained areas of a 35 dBZ
reflectivity and higher surrounded by a bunch of 0's.  They weren't
> forecasting reflectivities below 35 dBZ.  The observations, on the
other hand, contained areas of 35 dBZ surrounded by areas of
reflectivity less than that.  Due the difference in the data, the
> smoothing and thresholding operations of MODE resulted in much
smaller forecast objects than observation objects.  Values of 35
surround by 0's yields smaller objects than values of 35 surrounded by
> 34, 33, 32, and so on.  In this case, we applied the raw_thresh to
the observation field and threw out any values less than 35.  That
way, the object definition process treated the forecast and
> observation fields the same.
>
> That's a long explanation for the utility of the raw_thresh option.
Most users will be comparing continuous forecast and observation
fields that are defined in the same way.  If the raw_thresh is not
> used the RAW and FILTER CTS lines will be identical.
>
> Lastly, the OBJECT line contains counts that are computed grid-point
by grid-point over the field of resolved objects.  If an object exists
at the point, then the event is occurring.  If not, there's
> no event.  The matching/merging of objects has no impact here.  It's
just checking grid-point by grid-point to see if an object exists in
the forecast and observation fields, and deriving a 2x2
> contingency table in the process.
>
> Now, to confuse you further :) there's another way some people have
analyzed the output of MODE.  Take a look at this page for the
mode_summary.R script:
>     http://www.dtcenter.org/met/users/downloads/analysis_scripts.php
>
> This script reads through many MODE output files and sums up the
counts and areas of matched and unmatched objects in the output.  One
could choose to define a 2x2 contingency table from the counts or
> areas of these matched/unmatched objects.  It's straight-forward to
pick out hits ((matched forecast objects + matched observation
objects)/2), misses (unmatched observation objects), and false alarms
> (unmatched forecast objects).  But it's more difficult to pick the
4th cell of the contingency table, correct negatives.
>
> Object-based verification is still relatively new, and while it
provides you with a wealth of diagnostic information, I think there's
still work to be done in analyzing and understanding it's output.
>
> Hope that helps.
>
> Thanks,
> John Halley Gotway
> met_help at ucar.edu
>
> On 09/18/2013 06:35 AM, Maria Eugenia via RT wrote:
>>
>> Wed Sep 18 06:35:42 2013: Request 63061 was acted upon.
>> Transaction: Ticket created by maria.frediani at gmail.com
>>         Queue: met_help
>>       Subject: further information on MODE output
>>         Owner: Nobody
>>    Requestors: maria.frediani at gmail.com
>>        Status: new
>>   Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=63061 >
>>
>>
>> Dear met_help,
>>
>> I'm trying to fully understand the outputs MODE creates but I
couldn't
>> find what I'm looking for in the publications (Davis 2006 - part 1
and
>> part 2).
>>
>> My question is regarding the ASCII file that contains the
contingency
>> table counts and statistics. In sec 6.3.3. of the manual for v4.0
>>
(http://www.dtcenter.org/met/users/docs/users_guide/MET_Users_Guide_v4.0.1.pdf)
>> at page 119, it says:
>> This file consists of 4 lines. The first is a header line
containing
>> column names. The second line contains data comparing the two raw
>> fields after any masking of bad data or based on a grid or lat/lon
>> polygon has been applied. The third contains data comparing the two
>> fields after any raw thresholds have been applied. The fourth, and
>> last, line contains data comparing the derived object fields scored
>> using traditional measures.
>>
>> If I understand correctly, the 2nd and 3rd line do not consider the
>> objects created by MODE. The 2nd line applies the definitions for
(i)
>> mask_missing_flag and (ii) mask [ grid, grid_flag, poly, poly_flag
]
>> from the MODE configuration file. The 3rd line includes what has
been
>> applied to line #2 (i, ii) plus the value of (iii) raw_thresh from
the
>> configuration file. The results obtained in both 2nd and 3rd line
are
>> a simple grid to grid (obs to forecast) comparison. In other words,
>> the gridcell (1,1) from obs file is compared to gridcell (1,1) from
>> forecast.
>>   The 4th line is when the objects are considered. But I don't
>> understand how the counts for the contingency table are made. Is it
>> simply considering the gridcell values inside the objects, ignoring
>> the values outside the objects, and comparing the gridcells of obs
to
>> the equivalent gridcells of forecast (like I assumed it was done
for
>> lines 2 and 3)? Or it it shifting the objects to match the
centroids
>> and then comparing the gridcells relative to this new positioning?
>>
>> Thanks again,
>>
>> Maria Eugenia B. Frediani
>>
-------------------------------------------------------------------------------------
>> PhD Candidate
>> University of Connecticut
>> School of Engineering
>> 261 Glenbrook Rd
>> Storrs, CT 06269
>> maria.frediani at uconn.edu
>>
>

------------------------------------------------