[Met_help] [rt.rap.ucar.edu #85266] History for Question about aggregating in MET

Wed May 23 09:19:55 MDT 2018

----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Hi METHelp,

How exactly does MET handle verification of an 18hour forecast of
accumulated precipitation where observations are in 3-hour buckets? Would I
get the same results if my observations were in 18-hour buckets instead of
aggregating from 3-hour buckets to 18 hours in METViewer?

Thanks
Donnie

----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: Question about aggregating in MET
From: John Halley Gotway
Time: Tue May 22 15:01:38 2018

Hi Donnie,

I see that you're asking about verifying precip using MET/METViewer.
You're wondering whether evaluating 3-hourly precip 6 times is
essentially
the same as evaluating 18-hour precip once.

If we need more clarification, I could ask a statistician on our team
to
weigh in, but I'm pretty confident in saying that, no, they are not
the
same.  The results will be different.

There are several details to think about here.  One of them is what
type of
statistic are you considering?  "Continuous" statistics, like
Root-Mean-Squared-Error, are computed on the raw input forecast and
observation values.  We compute the difference between them, f - o,
and
then statistics like RMSE summarize those difference over multiple
points.
Let's say, for sake of argument, that the precip from your model is
precisely correct but always 3-hours late relative to the
observations.  If
you verify the 3-hour buckets, then your scores will be horrible.  But
if
you sum the accumulations up to 18 hours, then you'll be pretty darn
good,
only getting the last 3 hour's worth wrong.

The more common method for verifying precip is done using
"Categorical"
statistics, like Equitable Threat Score.  For this, you pick one or
more
thresholds of interest, classify the forecast and observation values
into
event/non-event and see how the forecast and observation categories
compare.  The big difference here is what thresholds to pick.  Perhaps
1"
of precip in 3 hours is a lot, but 1" in 18 hours isn't nearly as
significant.

To put it simply, the larger the buckets, the more you're smoothing
out
your model performance through time.  So you're losing resolution in
evaluating the performance of the model.  As a general rule of thumb,
I
would let the frequency of the observations decide it.  Use the finest
resolution of observation accumulations that are available... unless
of
course, you have some other good reason not to.

Let me address one other question... METViewer can "aggregate" results
across multiple cases.  For categorical data, we take the counts of
the 2x2
tables (i.e. counts of fcst/obs events/non-events), sum them up
cell-by-cell across all the cases, and then recompute stats from the
aggregated table.  However, there is an option in METViewer to instead
look
at the mean of the daily stats.  Both are useful measures of skill but
answer slightly different questions.

Hopefully that helps clarify.

John

On Tue, May 22, 2018 at 11:08 AM, Donald Lippi - NOAA Affiliate via RT
<
met_help at ucar.edu> wrote:

>
> Tue May 22 11:08:19 2018: Request 85266 was acted upon.
> Transaction: Ticket created by donald.e.lippi at noaa.gov
>        Queue: met_help
>      Subject: Question about aggregating in MET
>        Owner: Nobody
>   Requestors: donald.e.lippi at noaa.gov
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85266 >
>
>
> Hi METHelp,
>
> How exactly does MET handle verification of an 18hour forecast of
> accumulated precipitation where observations are in 3-hour buckets?
Would I
> get the same results if my observations were in 18-hour buckets
instead of
> aggregating from 3-hour buckets to 18 hours in METViewer?
>
> Thanks
> Donnie
>
>

------------------------------------------------
Subject: Question about aggregating in MET
From: Donald Lippi - NOAA Affiliate
Time: Wed May 23 08:47:47 2018

Hi John,

Thank you for your helpful clarification.

Donnie

On Tue, May 22, 2018 at 5:01 PM, John Halley Gotway via RT <
met_help at ucar.edu> wrote:

> Hi Donnie,
>
> I see that you're asking about verifying precip using MET/METViewer.
> You're wondering whether evaluating 3-hourly precip 6 times is
essentially
> the same as evaluating 18-hour precip once.
>
> If we need more clarification, I could ask a statistician on our
team to
> weigh in, but I'm pretty confident in saying that, no, they are not
the
> same.  The results will be different.
>
> There are several details to think about here.  One of them is what
type of
> statistic are you considering?  "Continuous" statistics, like
> Root-Mean-Squared-Error, are computed on the raw input forecast and
> observation values.  We compute the difference between them, f - o,
and
> then statistics like RMSE summarize those difference over multiple
points.
> Let's say, for sake of argument, that the precip from your model is
> precisely correct but always 3-hours late relative to the
observations.  If
> you verify the 3-hour buckets, then your scores will be horrible.
But if
> you sum the accumulations up to 18 hours, then you'll be pretty darn
good,
> only getting the last 3 hour's worth wrong.
>
> The more common method for verifying precip is done using
"Categorical"
> statistics, like Equitable Threat Score.  For this, you pick one or
more
> thresholds of interest, classify the forecast and observation values
into
> event/non-event and see how the forecast and observation categories
> compare.  The big difference here is what thresholds to pick.
Perhaps 1"
> of precip in 3 hours is a lot, but 1" in 18 hours isn't nearly as
> significant.
>
> To put it simply, the larger the buckets, the more you're smoothing
out
> your model performance through time.  So you're losing resolution in
> evaluating the performance of the model.  As a general rule of
thumb, I
> would let the frequency of the observations decide it.  Use the
finest
> resolution of observation accumulations that are available... unless
of
> course, you have some other good reason not to.
>
> Let me address one other question... METViewer can "aggregate"
results
> across multiple cases.  For categorical data, we take the counts of
the 2x2
> tables (i.e. counts of fcst/obs events/non-events), sum them up
> cell-by-cell across all the cases, and then recompute stats from the
> aggregated table.  However, there is an option in METViewer to
instead look
> at the mean of the daily stats.  Both are useful measures of skill
but
> answer slightly different questions.
>
> Hopefully that helps clarify.
>
> John
>
> On Tue, May 22, 2018 at 11:08 AM, Donald Lippi - NOAA Affiliate via
RT <
> met_help at ucar.edu> wrote:
>
> >
> > Tue May 22 11:08:19 2018: Request 85266 was acted upon.
> > Transaction: Ticket created by donald.e.lippi at noaa.gov
> >        Queue: met_help
> >      Subject: Question about aggregating in MET
> >        Owner: Nobody
> >   Requestors: donald.e.lippi at noaa.gov
> >       Status: new
> >  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85266 >
> >
> >
> > Hi METHelp,
> >
> > How exactly does MET handle verification of an 18hour forecast of
> > accumulated precipitation where observations are in 3-hour
buckets?
> Would I
> > get the same results if my observations were in 18-hour buckets
instead
> of
> > aggregating from 3-hour buckets to 18 hours in METViewer?
> >
> > Thanks
> > Donnie
> >
> >
>
>

------------------------------------------------