# [Met_help] [rt.rap.ucar.edu #70852] History for Box Plot Statistics in TCSTAT "summary" output

John Halley Gotway via RT met_help at ucar.edu
Tue Mar 3 09:36:40 MST 2015

----------------------------------------------------------------
Initial Request
----------------------------------------------------------------

Dear MET Help,

I am looking to calculate box plots from MET-TC's TCSTAT "summary" output.
Some of the information is already there. I have enough information to
calculate the actual box, but it would be nice to add some extra
information to the summary output to plot the correct "whiskers" onto the
box plot.

I'm sure you know about box plots already, but just so that we are on the
same page... Here is a good reference on how to plot these boxes:
https://plot.ly/box-plot/

The "box" is calculated with the 25th percentile, the median (50th
percentile), and the 75th percentile. This information is already in the
TCSTAT "summary" file. So far so good.

The whiskers require a little bit of extra calculation. The traditional way
to plot the whiskers is to first calculate the interquartile range (*IQR*),
which is equal to the range between the 25th and 75th percentiles. I have
enough information to do this easily from the TCSTAT "summary" file. Again,
so far so good.

So, here is where I need a little bit of help. After calculating the *IQR*,
one should multiply this value by 1.5 to draw a "rough" whisker on both
ends of the box. So each whisker has a length of 1.5*IQR. However, to get
the "actual" whisker length, one should choose the data value that is
closest to the end of the whisker, but not outside of the whisker (See the
animation on the web link I added above if this isn't entirely clear). Now,
I could do this myself by importing the TCSTAT "filter" output into my
plotting script and finding these whisker-defining values. However, it
would help me (and others I'm sure) immensely if MET-TC found this value
and printed it out in the TCSTAT "summary" output.

Do you think this is doable? I really do believe this will make MET-TC a
more versatile tool! You could call these values "BW_lo" and "BW_hi" for
"Box Whisper Lo/Hi"

Best,
Gus

----------------------------------------------------------------
Complete Ticket History
----------------------------------------------------------------

Subject: Box Plot Statistics in TCSTAT "summary" output
From: John Halley Gotway
Time: Mon Mar 02 12:31:40 2015

Gus,

Yes, I'm familiar with the boxplot.  We use it a lot around here to
nicely
summarize a set of data.  However, it wasn't our intention with the
"summary" job to dump out all the values required to reproduce a
boxplot.
Instead, we were basically replicating the functionality of the
"summary"
command in R, which dumps out the min, max, 25th, 50th, 75th
percentile,
and the mean value.

I see your point that we could compute the end of the location of the
whiskers following Tukey's definition, but someone else might ask us
for
all the outliers.  And that could get messy quickly.

Are you aware of the "plot_tcmpr.R" script that's included with the
MET
release?  It does two main things...
(1) It takes the user's options from the command line and calls the
tc_stat tool to filter down the data.
(2) It reads the filtered output from tc_stat and creates one or
more
plots for the user-specified columns of data.  It makes line plots,
point
plots, boxplots, and a couple of other types.

Here's how you might run it:
Rscript scripts/Rscripts/plot_tcmpr.R -lookin alal2010.tcst -filter
"-amodel AHWI"

That'll filter down the track data to only AHWI data and, by default,
create a time series of boxplots of the TK_ERR column.

To see the usage statement, just run:
Rscript scripts/Rscripts/plot_tcmpr.R

Is using that script an option for you?

Thanks,
John

On Fri, Feb 27, 2015 at 1:36 PM, Ghassan Alaka - NOAA Affiliate via RT
<
met_help at ucar.edu> wrote:

>
> Fri Feb 27 13:36:30 2015: Request 70852 was acted upon.
> Transaction: Ticket created by ghassan.alaka at noaa.gov
>        Queue: met_help
>      Subject: Box Plot Statistics in TCSTAT "summary" output
>        Owner: Nobody
>   Requestors: ghassan.alaka at noaa.gov
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=70852 >
>
>
> Dear MET Help,
>
> I am looking to calculate box plots from MET-TC's TCSTAT "summary"
output.
> Some of the information is already there. I have enough information
to
> calculate the actual box, but it would be nice to add some extra
> information to the summary output to plot the correct "whiskers"
onto the
> box plot.
>
> I'm sure you know about box plots already, but just so that we are
on the
> same page... Here is a good reference on how to plot these boxes:
> https://plot.ly/box-plot/
>
> The "box" is calculated with the 25th percentile, the median (50th
> percentile), and the 75th percentile. This information is already in
the
> TCSTAT "summary" file. So far so good.
>
> The whiskers require a little bit of extra calculation. The
> to plot the whiskers is to first calculate the interquartile range
(*IQR*),
> which is equal to the range between the 25th and 75th percentiles. I
have
> enough information to do this easily from the TCSTAT "summary" file.
Again,
> so far so good.
>
> So, here is where I need a little bit of help. After calculating the
*IQR*,
> one should multiply this value by 1.5 to draw a "rough" whisker on
both
> ends of the box. So each whisker has a length of 1.5*IQR. However,
to get
> the "actual" whisker length, one should choose the data value that
is
> closest to the end of the whisker, but not outside of the whisker
(See the
> animation on the web link I added above if this isn't entirely
clear). Now,
> I could do this myself by importing the TCSTAT "filter" output into
my
> plotting script and finding these whisker-defining values. However,
it
> would help me (and others I'm sure) immensely if MET-TC found this
value
> and printed it out in the TCSTAT "summary" output.
>
> Do you think this is doable? I really do believe this will make MET-
TC a
> more versatile tool! You could call these values "BW_lo" and "BW_hi"
for
> "Box Whisper Lo/Hi"
>
> Best,
> Gus
>
>

------------------------------------------------
Subject: Box Plot Statistics in TCSTAT "summary" output
From: Ghassan Alaka - NOAA Affiliate
Time: Mon Mar 02 12:44:33 2015

Hi John,

Thanks for the reply. I see your points. I will look into
plot_tcmpr.R, but
I may also just set the whiskers to be the max/min since that data is
available in the summary file.

I'll keep you posted.

Best,
Gus

On Mon, Mar 2, 2015 at 2:31 PM, John Halley Gotway via RT
<met_help at ucar.edu
> wrote:

> Gus,
>
> Yes, I'm familiar with the boxplot.  We use it a lot around here to
nicely
> summarize a set of data.  However, it wasn't our intention with the
> "summary" job to dump out all the values required to reproduce a
boxplot.
> Instead, we were basically replicating the functionality of the
"summary"
> command in R, which dumps out the min, max, 25th, 50th, 75th
percentile,
> and the mean value.
>
> I see your point that we could compute the end of the location of
the
> whiskers following Tukey's definition, but someone else might ask us
for
> all the outliers.  And that could get messy quickly.
>
> Are you aware of the "plot_tcmpr.R" script that's included with the
MET
> release?  It does two main things...
>    (1) It takes the user's options from the command line and calls
the
> tc_stat tool to filter down the data.
>    (2) It reads the filtered output from tc_stat and creates one or
more
> plots for the user-specified columns of data.  It makes line plots,
point
> plots, boxplots, and a couple of other types.
>
> Here's how you might run it:
>    Rscript scripts/Rscripts/plot_tcmpr.R -lookin alal2010.tcst
-filter
> "-amodel AHWI"
>
> That'll filter down the track data to only AHWI data and, by
default,
> create a time series of boxplots of the TK_ERR column.
>
> To see the usage statement, just run:
>    Rscript scripts/Rscripts/plot_tcmpr.R
>
> Is using that script an option for you?
>
> Thanks,
> John
>
>
>
> On Fri, Feb 27, 2015 at 1:36 PM, Ghassan Alaka - NOAA Affiliate via
RT <
> met_help at ucar.edu> wrote:
>
> >
> > Fri Feb 27 13:36:30 2015: Request 70852 was acted upon.
> > Transaction: Ticket created by ghassan.alaka at noaa.gov
> >        Queue: met_help
> >      Subject: Box Plot Statistics in TCSTAT "summary" output
> >        Owner: Nobody
> >   Requestors: ghassan.alaka at noaa.gov
> >       Status: new
> >  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=70852 >
> >
> >
> > Dear MET Help,
> >
> > I am looking to calculate box plots from MET-TC's TCSTAT "summary"
> output.
> > Some of the information is already there. I have enough
information to
> > calculate the actual box, but it would be nice to add some extra
> > information to the summary output to plot the correct "whiskers"
onto the
> > box plot.
> >
> > I'm sure you know about box plots already, but just so that we are
on the
> > same page... Here is a good reference on how to plot these boxes:
> > https://plot.ly/box-plot/
> >
> > The "box" is calculated with the 25th percentile, the median (50th
> > percentile), and the 75th percentile. This information is already
in the
> > TCSTAT "summary" file. So far so good.
> >
> > The whiskers require a little bit of extra calculation. The
> way
> > to plot the whiskers is to first calculate the interquartile range
> (*IQR*),
> > which is equal to the range between the 25th and 75th percentiles.
I have
> > enough information to do this easily from the TCSTAT "summary"
file.
> Again,
> > so far so good.
> >
> > So, here is where I need a little bit of help. After calculating
the
> *IQR*,
> > one should multiply this value by 1.5 to draw a "rough" whisker on
both
> > ends of the box. So each whisker has a length of 1.5*IQR. However,
to get
> > the "actual" whisker length, one should choose the data value that
is
> > closest to the end of the whisker, but not outside of the whisker
(See
> the
> > animation on the web link I added above if this isn't entirely
clear).
> Now,
> > I could do this myself by importing the TCSTAT "filter" output
into my
> > plotting script and finding these whisker-defining values.
However, it
> > would help me (and others I'm sure) immensely if MET-TC found this
value
> > and printed it out in the TCSTAT "summary" output.
> >
> > Do you think this is doable? I really do believe this will make
MET-TC a
> > more versatile tool! You could call these values "BW_lo" and
"BW_hi" for
> > "Box Whisper Lo/Hi"
> >
> > Best,
> > Gus
> >
> >
>
>

------------------------------------------------