[Met_help] [rt.rap.ucar.edu #100711] History for StatAnalysis Help

John Halley Gotway via RT met_help at ucar.edu
Wed Jul 28 11:40:49 MDT 2021


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Hi wonderful MET help team!

I told you it wouldn't be long before you heard from me! I hope you're all
doing well.

I'm running into a few issues with StatAnalysis. I'm running StatAnalysis
on a HPC via Singularity starting from the DTC Docker image. I am using
version 10.0.0.

1. I run into a performance issue when running multiple *-by* options. For
example, if I run a command
"stat_analysis -lookin <stat_file_directory> \
-job aggregate_stat -line_type MPR -out_line_type CNT \
-out_stat <out_put_file>  \
-fcst_var TMP -obs_var TMP -fcst_lead 06 -fcst_init_beg 2021060112 \
-fcst_init_end 2021060712 -by OBS_SID -set_hdr VX_MASK OBS_SID -set_hdr
DESC CASE -out_bin_size 1 -v 3"

It runs in about 1 minute and 20 seconds (7000 lines). If I add *-by
FCST_VAR, OBS_SID*, the job consumes all available 125 G of compute node
RAM before crashing. Do you know why this would occur? I am following the
example in the NRL tutorial StatAnalysis presentation (slide 14) that uses
multiple -by statements. I tried turning on debugging (-v 4) and I don't
get any related messages.

2. I run into a second performance issue when running any command with the
following flags and settings: *-aggregate_stat -line_type MPR.* For each
matched pair, a CDF is calculated with the default number of thresholds of
20. However, for each matched pair after the first, the previous matched
pair's CDF thresholds are added, like so:

"DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
"center_bins" (false), defined climatology CDF thresholds:
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000
DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
"center_bins" (false), defined climatology CDF thresholds:
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000,>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000"

This is cumulative, so when running over 5000 matched pairs, the last one
has 100,000 thresholds attached to it. This creates a job so complex that
the job does not finish. I was able to circumvent this problem by adding
the flag *-out_bin_size 1*, but I still figured you would want to know
about it.

That's all I have for now. Thank you in advance for your help!

Best,
Lindsay

--
Lindsay Blank

Forecast Verification and Analytics Developer

*she/her*

Spire Global, Inc

San Francisco | Boulder | Washington D.C. | Singapore | Glasgow | Luxembourg


----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: StatAnalysis Help
From: John Halley Gotway
Time: Thu Jul 22 08:58:12 2021

Hi Lindsay,

The support for the METplus suite of products has moved from
met_help at ucar.edu to the METplus Discussions Page on GitHub.

Please post your question to the "Incoming" category there.

***** ALERT: THIS E-MAIL ADDRESS IS NO LONGER IN USE FOR SUPPORT OF
THE
METPLUS VERIFICATION SYSTEM.  WE ARE NO LONGER SUPPORTING NEW
QUESTIONS OR
MONITORING THIS EMAIL.

PLEASE CREATE A FREE GITHUB ACCOUNT AND POST YOUR QUESTIONS TO THE
METPLUS
COMPONENTS DISCUSSION FORUM AT
https://github.com/dtcenter/METplus/discussions.
*****

On Wed, Jul 21, 2021 at 4:03 PM Lindsay Blank via RT
<met_help at ucar.edu>
wrote:

>
> Wed Jul 21 16:03:02 2021: Request 100711 was acted upon.
> Transaction: Ticket created by lindsay.blank at spire.com
>        Queue: met_help
>      Subject: StatAnalysis Help
>        Owner: Nobody
>   Requestors: lindsay.blank at spire.com
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=100711 >
>
>
> Hi wonderful MET help team!
>
> I told you it wouldn't be long before you heard from me! I hope
you're all
> doing well.
>
> I'm running into a few issues with StatAnalysis. I'm running
StatAnalysis
> on a HPC via Singularity starting from the DTC Docker image. I am
using
> version 10.0.0.
>
> 1. I run into a performance issue when running multiple *-by*
options. For
> example, if I run a command
> "stat_analysis -lookin <stat_file_directory> \
> -job aggregate_stat -line_type MPR -out_line_type CNT \
> -out_stat <out_put_file>  \
> -fcst_var TMP -obs_var TMP -fcst_lead 06 -fcst_init_beg 2021060112 \
> -fcst_init_end 2021060712 -by OBS_SID -set_hdr VX_MASK OBS_SID
-set_hdr
> DESC CASE -out_bin_size 1 -v 3"
>
> It runs in about 1 minute and 20 seconds (7000 lines). If I add *-by
> FCST_VAR, OBS_SID*, the job consumes all available 125 G of compute
node
> RAM before crashing. Do you know why this would occur? I am
following the
> example in the NRL tutorial StatAnalysis presentation (slide 14)
that uses
> multiple -by statements. I tried turning on debugging (-v 4) and I
don't
> get any related messages.
>
> 2. I run into a second performance issue when running any command
with the
> following flags and settings: *-aggregate_stat -line_type MPR.* For
each
> matched pair, a CDF is calculated with the default number of
thresholds of
> 20. However, for each matched pair after the first, the previous
matched
> pair's CDF thresholds are added, like so:
>
> "DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
> "center_bins" (false), defined climatology CDF thresholds:
>
>
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000
> DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
> "center_bins" (false), defined climatology CDF thresholds:
>
>
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000,>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000"
>
> This is cumulative, so when running over 5000 matched pairs, the
last one
> has 100,000 thresholds attached to it. This creates a job so complex
that
> the job does not finish. I was able to circumvent this problem by
adding
> the flag *-out_bin_size 1*, but I still figured you would want to
know
> about it.
>
> That's all I have for now. Thank you in advance for your help!
>
> Best,
> Lindsay
>
> --
> Lindsay Blank
>
> Forecast Verification and Analytics Developer
>
> *she/her*
>
> Spire Global, Inc
>
> San Francisco | Boulder | Washington D.C. | Singapore | Glasgow |
> Luxembourg
>
>

------------------------------------------------


More information about the Met_help mailing list