[Met_help] [rt.rap.ucar.edu #100711] History for StatAnalysis Help
George McCabe via RT
met_help at ucar.edu
Wed Aug 18 09:00:16 MDT 2021
----------------------------------------------------------------
Initial Request
----------------------------------------------------------------
Hi wonderful MET help team!
I told you it wouldn't be long before you heard from me! I hope you're all
doing well.
I'm running into a few issues with StatAnalysis. I'm running StatAnalysis
on a HPC via Singularity starting from the DTC Docker image. I am using
version 10.0.0.
1. I run into a performance issue when running multiple *-by* options. For
example, if I run a command
"stat_analysis -lookin <stat_file_directory> \
-job aggregate_stat -line_type MPR -out_line_type CNT \
-out_stat <out_put_file> \
-fcst_var TMP -obs_var TMP -fcst_lead 06 -fcst_init_beg 2021060112 \
-fcst_init_end 2021060712 -by OBS_SID -set_hdr VX_MASK OBS_SID -set_hdr
DESC CASE -out_bin_size 1 -v 3"
It runs in about 1 minute and 20 seconds (7000 lines). If I add *-by
FCST_VAR, OBS_SID*, the job consumes all available 125 G of compute node
RAM before crashing. Do you know why this would occur? I am following the
example in the NRL tutorial StatAnalysis presentation (slide 14) that uses
multiple -by statements. I tried turning on debugging (-v 4) and I don't
get any related messages.
2. I run into a second performance issue when running any command with the
following flags and settings: *-aggregate_stat -line_type MPR.* For each
matched pair, a CDF is calculated with the default number of thresholds of
20. However, for each matched pair after the first, the previous matched
pair's CDF thresholds are added, like so:
"DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
"center_bins" (false), defined climatology CDF thresholds:
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000
DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
"center_bins" (false), defined climatology CDF thresholds:
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000,>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000"
This is cumulative, so when running over 5000 matched pairs, the last one
has 100,000 thresholds attached to it. This creates a job so complex that
the job does not finish. I was able to circumvent this problem by adding
the flag *-out_bin_size 1*, but I still figured you would want to know
about it.
That's all I have for now. Thank you in advance for your help!
Best,
Lindsay
--
Lindsay Blank
Forecast Verification and Analytics Developer
*she/her*
Spire Global, Inc
San Francisco | Boulder | Washington D.C. | Singapore | Glasgow | Luxembourg
----------------------------------------------------------------
Complete Ticket History
----------------------------------------------------------------
Subject: StatAnalysis Help
From: John Halley Gotway
Time: Thu Jul 22 08:58:12 2021
Hi Lindsay,
The support for the METplus suite of products has moved from
met_help at ucar.edu to the METplus Discussions Page on GitHub.
Please post your question to the "Incoming" category there.
***** ALERT: THIS E-MAIL ADDRESS IS NO LONGER IN USE FOR SUPPORT OF
THE
METPLUS VERIFICATION SYSTEM. WE ARE NO LONGER SUPPORTING NEW
QUESTIONS OR
MONITORING THIS EMAIL.
PLEASE CREATE A FREE GITHUB ACCOUNT AND POST YOUR QUESTIONS TO THE
METPLUS
COMPONENTS DISCUSSION FORUM AT
https://github.com/dtcenter/METplus/discussions.
*****
On Wed, Jul 21, 2021 at 4:03 PM Lindsay Blank via RT
<met_help at ucar.edu>
wrote:
>
> Wed Jul 21 16:03:02 2021: Request 100711 was acted upon.
> Transaction: Ticket created by lindsay.blank at spire.com
> Queue: met_help
> Subject: StatAnalysis Help
> Owner: Nobody
> Requestors: lindsay.blank at spire.com
> Status: new
> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=100711 >
>
>
> Hi wonderful MET help team!
>
> I told you it wouldn't be long before you heard from me! I hope
you're all
> doing well.
>
> I'm running into a few issues with StatAnalysis. I'm running
StatAnalysis
> on a HPC via Singularity starting from the DTC Docker image. I am
using
> version 10.0.0.
>
> 1. I run into a performance issue when running multiple *-by*
options. For
> example, if I run a command
> "stat_analysis -lookin <stat_file_directory> \
> -job aggregate_stat -line_type MPR -out_line_type CNT \
> -out_stat <out_put_file> \
> -fcst_var TMP -obs_var TMP -fcst_lead 06 -fcst_init_beg 2021060112 \
> -fcst_init_end 2021060712 -by OBS_SID -set_hdr VX_MASK OBS_SID
-set_hdr
> DESC CASE -out_bin_size 1 -v 3"
>
> It runs in about 1 minute and 20 seconds (7000 lines). If I add *-by
> FCST_VAR, OBS_SID*, the job consumes all available 125 G of compute
node
> RAM before crashing. Do you know why this would occur? I am
following the
> example in the NRL tutorial StatAnalysis presentation (slide 14)
that uses
> multiple -by statements. I tried turning on debugging (-v 4) and I
don't
> get any related messages.
>
> 2. I run into a second performance issue when running any command
with the
> following flags and settings: *-aggregate_stat -line_type MPR.* For
each
> matched pair, a CDF is calculated with the default number of
thresholds of
> 20. However, for each matched pair after the first, the previous
matched
> pair's CDF thresholds are added, like so:
>
> "DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
> "center_bins" (false), defined climatology CDF thresholds:
>
>
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000
> DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
> "center_bins" (false), defined climatology CDF thresholds:
>
>
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000,>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000"
>
> This is cumulative, so when running over 5000 matched pairs, the
last one
> has 100,000 thresholds attached to it. This creates a job so complex
that
> the job does not finish. I was able to circumvent this problem by
adding
> the flag *-out_bin_size 1*, but I still figured you would want to
know
> about it.
>
> That's all I have for now. Thank you in advance for your help!
>
> Best,
> Lindsay
>
> --
> Lindsay Blank
>
> Forecast Verification and Analytics Developer
>
> *she/her*
>
> Spire Global, Inc
>
> San Francisco | Boulder | Washington D.C. | Singapore | Glasgow |
> Luxembourg
>
>
------------------------------------------------
Subject: StatAnalysis Help
From: Lindsay Blank
Time: Tue Aug 10 15:48:36 2021
Thanks, John. Will do!
On Thu, Jul 22, 2021 at 8:58 AM John Halley Gotway via RT
<met_help at ucar.edu>
wrote:
> Hi Lindsay,
>
> The support for the METplus suite of products has moved from
> met_help at ucar.edu to the METplus Discussions Page on GitHub.
>
> Please post your question to the "Incoming" category there.
>
> ***** ALERT: THIS E-MAIL ADDRESS IS NO LONGER IN USE FOR SUPPORT OF
THE
> METPLUS VERIFICATION SYSTEM. WE ARE NO LONGER SUPPORTING NEW
QUESTIONS OR
> MONITORING THIS EMAIL.
>
> PLEASE CREATE A FREE GITHUB ACCOUNT AND POST YOUR QUESTIONS TO THE
METPLUS
> COMPONENTS DISCUSSION FORUM AT
> https://github.com/dtcenter/METplus/discussions.
> *****
>
> On Wed, Jul 21, 2021 at 4:03 PM Lindsay Blank via RT
<met_help at ucar.edu>
> wrote:
>
> >
> > Wed Jul 21 16:03:02 2021: Request 100711 was acted upon.
> > Transaction: Ticket created by lindsay.blank at spire.com
> > Queue: met_help
> > Subject: StatAnalysis Help
> > Owner: Nobody
> > Requestors: lindsay.blank at spire.com
> > Status: new
> > Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=100711 >
> >
> >
> > Hi wonderful MET help team!
> >
> > I told you it wouldn't be long before you heard from me! I hope
you're
> all
> > doing well.
> >
> > I'm running into a few issues with StatAnalysis. I'm running
StatAnalysis
> > on a HPC via Singularity starting from the DTC Docker image. I am
using
> > version 10.0.0.
> >
> > 1. I run into a performance issue when running multiple *-by*
options.
> For
> > example, if I run a command
> > "stat_analysis -lookin <stat_file_directory> \
> > -job aggregate_stat -line_type MPR -out_line_type CNT \
> > -out_stat <out_put_file> \
> > -fcst_var TMP -obs_var TMP -fcst_lead 06 -fcst_init_beg 2021060112
\
> > -fcst_init_end 2021060712 -by OBS_SID -set_hdr VX_MASK OBS_SID
-set_hdr
> > DESC CASE -out_bin_size 1 -v 3"
> >
> > It runs in about 1 minute and 20 seconds (7000 lines). If I add *-
by
> > FCST_VAR, OBS_SID*, the job consumes all available 125 G of
compute node
> > RAM before crashing. Do you know why this would occur? I am
following the
> > example in the NRL tutorial StatAnalysis presentation (slide 14)
that
> uses
> > multiple -by statements. I tried turning on debugging (-v 4) and I
don't
> > get any related messages.
> >
> > 2. I run into a second performance issue when running any command
with
> the
> > following flags and settings: *-aggregate_stat -line_type MPR.*
For each
> > matched pair, a CDF is calculated with the default number of
thresholds
> of
> > 20. However, for each matched pair after the first, the previous
matched
> > pair's CDF thresholds are added, like so:
> >
> > "DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
> > "center_bins" (false), defined climatology CDF thresholds:
> >
> >
>
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000
> > DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
> > "center_bins" (false), defined climatology CDF thresholds:
> >
> >
>
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000,>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000"
> >
> > This is cumulative, so when running over 5000 matched pairs, the
last one
> > has 100,000 thresholds attached to it. This creates a job so
complex that
> > the job does not finish. I was able to circumvent this problem by
adding
> > the flag *-out_bin_size 1*, but I still figured you would want to
know
> > about it.
> >
> > That's all I have for now. Thank you in advance for your help!
> >
> > Best,
> > Lindsay
> >
> > --
> > Lindsay Blank
> >
> > Forecast Verification and Analytics Developer
> >
> > *she/her*
> >
> > Spire Global, Inc
> >
> > San Francisco | Boulder | Washington D.C. | Singapore | Glasgow |
> > Luxembourg
> >
> >
>
>
------------------------------------------------
Subject: StatAnalysis Help
From: George McCabe
Time: Wed Aug 18 09:00:11 2021
A corresponding topic is GitHub Discussions was created:
https://github.com/dtcenter/METplus/discussions/1076
------------------------------------------------
More information about the Met_help
mailing list