[Met_help] [rt.rap.ucar.edu #89093] History for Brier Skill Score Error Bars

Tue Jul 9 12:06:58 MDT 2019

----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

John and Tressa, MET calculates error bars for the Brier Score and the Brier Score of Climo.  What is the best way to combine these to get error bars for the brier skill score that results from the BS and BSclimo?

Thanks
Bob

----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: Brier Skill Score Error Bars
From: John Halley Gotway
Time: Wed Feb 27 16:22:00 2019

Bob,

That's a great question.  Unfortunately, I don't have a good answer
for
you.  This is just me guessing, but I doubt there's a statistically
sound
way of combining brier score error bars together to compute a brier
skill
score error bar.

Tressa (cc'ed here) is out of this office this month but will be
returning
in the next week or so.  She'd have the best perspective on this
issue.

I think this really underscores the need for us to get METviewer up
and
running at the Air Force to address these sort of issues.  The error
bars
for brier score that the MET tools compute quantify the *spatial*
sampling
uncertainty for each point in the time.  The error bars we get from
METviewer for Brier Skill Score quantify the *temporal* uncertainty
for
sampling results across multiple days.  And really, I think the latter
is
more meaningful.  I believe there is no parametric way to define the
BSS
confidence intervals, so in METviewer we compute it using the
bootstrap
resampling method.

I think you'll find METviewer to be a really useful way of plotting
and
summarizing output from individual MET runs across multiple cases.

Thanks,
John

On Tue, Feb 26, 2019 at 10:07 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> Tue Feb 26 10:03:17 2019: Request 89093 was acted upon.
> Transaction: Ticket created by robert.craig.2 at us.af.mil
>        Queue: met_help
>      Subject: Brier Skill Score Error Bars
>        Owner: Nobody
>   Requestors: robert.craig.2 at us.af.mil
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=89093 >
>
>
> John and Tressa, MET calculates error bars for the Brier Score and
the
> Brier Score of Climo.  What is the best way to combine these to get
error
> bars for the brier skill score that results from the BS and BSclimo?
>
> Thanks
> Bob
>
>

------------------------------------------------
Subject: RE: [Non-DoD Source] Re: [rt.rap.ucar.edu #89093] Brier Skill Score Error Bars
From: robert.craig.2 at us.af.mil
Time: Thu Feb 28 06:44:15 2019

John, that explains why the error bars didn't seem to care if I had
one day or many days in the sample.  One thing that my boss here has
been stressing is the need to run hypothesis testing when comparing
models - just basing the determination  of significance of model
differences on error bars is not sufficient.

Looking forward to MET viewer.

Bob

-----Original Message-----
From: John Halley Gotway via RT <met_help at ucar.edu>
Sent: Wednesday, February 27, 2019 5:22 PM
To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXN
<robert.craig.2 at us.af.mil>
Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #89093] Brier Skill
Score Error Bars

Bob,

That's a great question.  Unfortunately, I don't have a good answer
for you.  This is just me guessing, but I doubt there's a
statistically sound way of combining brier score error bars together
to compute a brier skill score error bar.

Tressa (cc'ed here) is out of this office this month but will be
returning in the next week or so.  She'd have the best perspective on
this issue.

I think this really underscores the need for us to get METviewer up
and running at the Air Force to address these sort of issues.  The
error bars for brier score that the MET tools compute quantify the
*spatial* sampling uncertainty for each point in the time.  The error
bars we get from METviewer for Brier Skill Score quantify the
*temporal* uncertainty for sampling results across multiple days.  And
really, I think the latter is more meaningful.  I believe there is no
parametric way to define the BSS confidence intervals, so in METviewer
we compute it using the bootstrap resampling method.

I think you'll find METviewer to be a really useful way of plotting
and summarizing output from individual MET runs across multiple cases.

Thanks,
John

On Tue, Feb 26, 2019 at 10:07 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> Tue Feb 26 10:03:17 2019: Request 89093 was acted upon.
> Transaction: Ticket created by robert.craig.2 at us.af.mil
>        Queue: met_help
>      Subject: Brier Skill Score Error Bars
>        Owner: Nobody
>   Requestors: robert.craig.2 at us.af.mil
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=89093
> >
>
>
> John and Tressa, MET calculates error bars for the Brier Score and
the
> Brier Score of Climo.  What is the best way to combine these to get
> error bars for the brier skill score that results from the BS and
BSclimo?
>
> Thanks
> Bob
>
>

------------------------------------------------
Subject: Brier Skill Score Error Bars
From: John Halley Gotway
Time: Thu Feb 28 09:49:31 2019

Bob,

Yes, we do use METviewer in the DTC to assess the statistical
significance
of model differences.  The method that's been recommended by Tressa,
and
other statisticians in our group, is to do the following:

- Use METviewer to plot a particular statistic of interest for 2
models.
- Configure METviewer to compute a pair-wise difference curve.  So for
each
output time, subtract model 1's statistic - model 2's statistic.
- Use the bootstrap resampling method to compute a confidence interval
for
that pairwise difference curve.
- Anywhere the confidence interval for the pairwise difference curve
do not
include 0, then the difference is statistically significant at the
chosen
confidence alpha value (i.e. 95%, 99%, or 99.9%).

I've attached a sample image illustrating this logic to compute
statistically significant pairwise differences for Brier Score (don't
have
any good BSS examples immediately on hand).
Anywhere the CI's on the gray difference line is bold, then the CI
does not
include 0, and the difference is significant at the 99% confidence
level.

But if you're looking at multiple variables/levels/regions/lead
times/statistics, then you would have a lot of plots to look at.  And
that
motivates the use of scorecards to quickly summarize results.  Over
the
last couple of years, we have worked with NOAA EMC to add the
capability of
computing scorecards to METviewer (in the batch engine, not the GUI).
It
usually take several minutes to crunch through all the numbers, but
the
result is a nice summary of where you have statistically significant
differences between 2 models.   Here's an example scorecard I got from
Tatiana.  The different colors/symbols in each cell indicate which
model is
better and at what significance level:
[image: image.png]
Hope that helps.

Thanks,
John

On Thu, Feb 28, 2019 at 6:44 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=89093 >
>
> John, that explains why the error bars didn't seem to care if I had
one
> day or many days in the sample.  One thing that my boss here has
been
> stressing is the need to run hypothesis testing when comparing
models -
> just basing the determination  of significance of model differences
on
> error bars is not sufficient.
>
> Looking forward to MET viewer.
>
> Bob
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Wednesday, February 27, 2019 5:22 PM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXN
<robert.craig.2 at us.af.mil>
> Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #89093] Brier Skill
Score
> Error Bars
>
> Bob,
>
> That's a great question.  Unfortunately, I don't have a good answer
for
> you.  This is just me guessing, but I doubt there's a statistically
sound
> way of combining brier score error bars together to compute a brier
skill
> score error bar.
>
> Tressa (cc'ed here) is out of this office this month but will be
returning
> in the next week or so.  She'd have the best perspective on this
issue.
>
> I think this really underscores the need for us to get METviewer up
and
> running at the Air Force to address these sort of issues.  The error
bars
> for brier score that the MET tools compute quantify the *spatial*
sampling
> uncertainty for each point in the time.  The error bars we get from
> METviewer for Brier Skill Score quantify the *temporal* uncertainty
for
> sampling results across multiple days.  And really, I think the
latter is
> more meaningful.  I believe there is no parametric way to define the
BSS
> confidence intervals, so in METviewer we compute it using the
bootstrap
> resampling method.
>
> I think you'll find METviewer to be a really useful way of plotting
and
> summarizing output from individual MET runs across multiple cases.
>
> Thanks,
> John
>
>
> On Tue, Feb 26, 2019 at 10:07 AM robert.craig.2 at us.af.mil via RT <
> met_help at ucar.edu> wrote:
>
> >
> > Tue Feb 26 10:03:17 2019: Request 89093 was acted upon.
> > Transaction: Ticket created by robert.craig.2 at us.af.mil
> >        Queue: met_help
> >      Subject: Brier Skill Score Error Bars
> >        Owner: Nobody
> >   Requestors: robert.craig.2 at us.af.mil
> >       Status: new
> >  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=89093
> > >
> >
> >
> > John and Tressa, MET calculates error bars for the Brier Score and
the
> > Brier Score of Climo.  What is the best way to combine these to
get
> > error bars for the brier skill score that results from the BS and
> BSclimo?
> >
> > Thanks
> > Bob
> >
> >
>
>
>
>

------------------------------------------------
Subject: Brier Skill Score Error Bars
From: Tressa Fowler
Time: Sun Mar 03 10:16:44 2019

Hi Bob,

The BSS sampling uncertainty approximation listed by Bradley et al in
WAF
vol 23 might be what you are after. You could use this estimate (based
on
Taylor expansion) if bootstrapping is not what you prefer. See article
at:

https://journals.ametsoc.org/doi/10.1175/2007WAF2007049.1

Hope this helps,

Tressa

On Wed, Feb 27, 2019 at 4:21 PM John Halley Gotway <johnhg at ucar.edu>
wrote:

> Bob,
>
> That's a great question.  Unfortunately, I don't have a good answer
for
> you.  This is just me guessing, but I doubt there's a statistically
sound
> way of combining brier score error bars together to compute a brier
skill
> score error bar.
>
> Tressa (cc'ed here) is out of this office this month but will be
returning
> in the next week or so.  She'd have the best perspective on this
issue.
>
> I think this really underscores the need for us to get METviewer up
and
> running at the Air Force to address these sort of issues.  The error
bars
> for brier score that the MET tools compute quantify the *spatial*
> sampling uncertainty for each point in the time.  The error bars we
get
> from METviewer for Brier Skill Score quantify the *temporal*
uncertainty
> for sampling results across multiple days.  And really, I think the
latter
> is more meaningful.  I believe there is no parametric way to define
the BSS
> confidence intervals, so in METviewer we compute it using the
bootstrap
> resampling method.
>
> I think you'll find METviewer to be a really useful way of plotting
and
> summarizing output from individual MET runs across multiple cases.
>
> Thanks,
> John
>
>
> On Tue, Feb 26, 2019 at 10:07 AM robert.craig.2 at us.af.mil via RT <
> met_help at ucar.edu> wrote:
>
>>
>> Tue Feb 26 10:03:17 2019: Request 89093 was acted upon.
>> Transaction: Ticket created by robert.craig.2 at us.af.mil
>>        Queue: met_help
>>      Subject: Brier Skill Score Error Bars
>>        Owner: Nobody
>>   Requestors: robert.craig.2 at us.af.mil
>>       Status: new
>>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=89093 >
>>
>>
>> John and Tressa, MET calculates error bars for the Brier Score and
the
>> Brier Score of Climo.  What is the best way to combine these to get
error
>> bars for the brier skill score that results from the BS and
BSclimo?
>>
>> Thanks
>> Bob
>>
>>

------------------------------------------------