[Met_help] [rt.rap.ucar.edu #100711] History for StatAnalysis Help

John Halley Gotway via RT met_help at ucar.edu
Thu Jul 22 08:57:55 MDT 2021


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Hi wonderful MET help team!

I told you it wouldn't be long before you heard from me! I hope you're all
doing well.

I'm running into a few issues with StatAnalysis. I'm running StatAnalysis
on a HPC via Singularity starting from the DTC Docker image. I am using
version 10.0.0.

1. I run into a performance issue when running multiple *-by* options. For
example, if I run a command
"stat_analysis -lookin <stat_file_directory> \
-job aggregate_stat -line_type MPR -out_line_type CNT \
-out_stat <out_put_file>  \
-fcst_var TMP -obs_var TMP -fcst_lead 06 -fcst_init_beg 2021060112 \
-fcst_init_end 2021060712 -by OBS_SID -set_hdr VX_MASK OBS_SID -set_hdr
DESC CASE -out_bin_size 1 -v 3"

It runs in about 1 minute and 20 seconds (7000 lines). If I add *-by
FCST_VAR, OBS_SID*, the job consumes all available 125 G of compute node
RAM before crashing. Do you know why this would occur? I am following the
example in the NRL tutorial StatAnalysis presentation (slide 14) that uses
multiple -by statements. I tried turning on debugging (-v 4) and I don't
get any related messages.

2. I run into a second performance issue when running any command with the
following flags and settings: *-aggregate_stat -line_type MPR.* For each
matched pair, a CDF is calculated with the default number of thresholds of
20. However, for each matched pair after the first, the previous matched
pair's CDF thresholds are added, like so:

"DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
"center_bins" (false), defined climatology CDF thresholds:
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000
DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and
"center_bins" (false), defined climatology CDF thresholds:
>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000,>=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000"

This is cumulative, so when running over 5000 matched pairs, the last one
has 100,000 thresholds attached to it. This creates a job so complex that
the job does not finish. I was able to circumvent this problem by adding
the flag *-out_bin_size 1*, but I still figured you would want to know
about it.

That's all I have for now. Thank you in advance for your help!

Best,
Lindsay

--
Lindsay Blank

Forecast Verification and Analytics Developer

*she/her*

Spire Global, Inc

San Francisco | Boulder | Washington D.C. | Singapore | Glasgow | Luxembourg


----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------



More information about the Met_help mailing list