[Met_help] [rt.rap.ucar.edu #74724] History for gsid2mpr (gsi tool) problem
John Halley Gotway via RT
met_help at ucar.edu
Wed Jan 20 13:18:27 MST 2016
----------------------------------------------------------------
Initial Request
----------------------------------------------------------------
hii am encountering the problem of gsid2mpr taking much time to give output.the following command i am using
$ gsid2mpr diag_conv_ges.2015060100 -swap -outdir out -v 3
if i am not using the -swap , the following error i get
DEBUG 1:
DEBUG 1: Reading: diag_conv_anl.20150606_1 ... 1 of 1
ERROR :
ERROR :
ERROR : read_fortran_binary() -> buffer too small ... increase buffer size to at least 67108864 bytes!
ERROR : Try using the -swap option to switch the endianness of the input binary files.
ERROR :
regardsjagdeep singh
----------------------------------------------------------------
Complete Ticket History
----------------------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Thu Jan 14 09:55:05 2016
Harvir,
I see that you're having a tough time running the gsi2mpr tool. If
not
using "-swap" leads to that error, then you should continue using the
"-swap" option.
Can you tell me how long it takes for the following command to run?
time gsid2mpr diag_conv_ges.2015060100 -swap -outdir out -v 3
If you'd like, you could send me that sample file
(diag_conv_ges.2015060100),
and I could try running it here to see how long it takes.
Follow these instructions to send us data:
http://www.dtcenter.org/met/users/support/met_help.php#ftp
Thanks,
John Halley Gotway
met_help at ucar.edu
------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: j singh
Time: Thu Jan 14 12:06:23 2016
the file is GFS MODEL conventional file. size around 84 GB. the prob
is its taking around 10 hr to give an output. since once the virtual
memory size becomes large the process becomes slow. though I have
turned the verbosity to 0 still, its showing the log on the screen
like SKIPPING DUPLICATE VALUE. I have gone through the gsid2mpr.cc
file too to
ascertain the log, but didn't want to switch it off. the code is
written by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT
say CONVSTAT file and run gsid2mpr
regards jagdeep
Sent from Yahoo Mail on Android
On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via
RT<met_help at ucar.edu> wrote: Harvir,
I see that you're having a tough time running the gsi2mpr tool. If
not
using "-swap" leads to that error, then you should continue using the
"-swap" option.
Can you tell me how long it takes for the following command to run?
time gsid2mpr diag_conv_ges.2015060100 -swap -outdir out -v 3
If you'd like, you could send me that sample file
(diag_conv_ges.2015060100),
and I could try running it here to see how long it takes.
Follow these instructions to send us data:
http://www.dtcenter.org/met/users/support/met_help.php#ftp
Thanks,
John Halley Gotway
met_help at ucar.edu
------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Thu Jan 14 15:03:29 2016
jagdeep,
Thanks for sending that sample data file. That 88 Mb file is
certainly
much larger than the 400 Kb test data files we used during
development!
After running that file on my machine for 10 minutes or so, I gave up
and
killed it.
I did some testing and found 2 things that are slowing it down a
lot...
(1) Resizing the output object for each record it reads.
(2) Checking for duplicates records.
Below I've listed the run times when adding logic to the code to fix
these
issues:
- Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
- Fix (2): 6 minutes, 36 seconds
- Both fixes (1) and (2): 1 minute, 45 seconds
I'd still like to keep checking for duplicates as the default
behavior, but
we could add a command line option to disable it.
Do those changes sound reasonable to you?
I won't be able to work on a patch for this until next week. In the
meantime, there's an easy hack you could do to skip over the checking
of
duplicates.
Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc. After
line
464 of that file, add "return(false);", as shown below. That will
disable
the checking of duplicates. Then recompile MET.
462
////////////////////////////////////////////////////////////////////////
463
464 bool is_dup(const char *key) {
return(false);
465 bool dup;
466
Thanks,
John Halley Gotway
met_help at ucar.edu
On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT <met_help at ucar.edu>
wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
> the file is GFS MODEL conventional file. size around 84 GB. the prob
is
> its taking around 10 hr to give an output. since once the virtual
memory
> size becomes large the process becomes slow. though I have turned
the
> verbosity to 0 still, its showing the log on the screen like
SKIPPING
> DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> ascertain the log, but didn't want to switch it off. the code is
written
> by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
CONVSTAT
> file and run gsid2mpr
> regards jagdeep
> Sent from Yahoo Mail on Android
>
> On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> met_help at ucar.edu> wrote: Harvir,
>
> I see that you're having a tough time running the gsi2mpr tool. If
not
> using "-swap" leads to that error, then you should continue using
the
> "-swap" option.
>
> Can you tell me how long it takes for the following command to run?
>
> time gsid2mpr diag_conv_ges.2015060100 -swap -outdir out -v 3
>
> If you'd like, you could send me that sample file (diag_conv_ges.
> 2015060100),
> and I could try running it here to see how long it takes.
>
> Follow these instructions to send us data:
> http://www.dtcenter.org/met/users/support/met_help.php#ftp
>
> Thanks,
> John Halley Gotway
> met_help at ucar.edu
>
>
>
>
------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: j singh
Time: Fri Jan 15 09:49:23 2016
[rt.rap.ucar.edu #74724]
hi john
your solution was good enough to shorten the output of gsid2mpr to
incredible 15 min. it was a relief.
there's another thing i need to ask
i am using the stat_analysis tools to get cnt out_line_type, its
taking around 1 1/2 hr, can you once look at it .you can use the file
which is the output of gsid2mpr; the file which i have sent you
yesterday (diag_conv_anl.2015060100), and use the following command
stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
aggregate_stat -line_type MPR -out_line_type CNT -fcst_var u -out
diag_conv_anl.2015060100.cnt -v 2
this is the log made during the run of above stat_analysis command
##############################################################################DEBUG
1: Creating STAT-Analysis output file "diag_conv_anl.2015060100.cnt"
DEBUG 2: STAT Lines read = 1434170
DEBUG 2: STAT Lines retained = 447165
DEBUG 2:
DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u -line_type
MPR -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
GSL_RNG_TYPE=mt19937
GSL_RNG_SEED=145791062
DEBUG 2: Computing output for 1 case(s).
DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
DEBUG 2: Job 1 used 444043 out of 447165 STAT
lines.##############################################################################please
enlighten me about the line as to why only 444043 were used insted of
the whole 447165.
i appreciate your help for the gsid2mpr issue and do update me if
there is any other fix regarding gsid2mpr.
thanks,jagdeep
On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT
<met_help at ucar.edu> wrote:
jagdeep,
Thanks for sending that sample data file. That 88 Mb file is
certainly
much larger than the 400 Kb test data files we used during
development!
After running that file on my machine for 10 minutes or so, I gave up
and
killed it.
I did some testing and found 2 things that are slowing it down a
lot...
(1) Resizing the output object for each record it reads.
(2) Checking for duplicates records.
Below I've listed the run times when adding logic to the code to fix
these
issues:
- Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
- Fix (2): 6 minutes, 36 seconds
- Both fixes (1) and (2): 1 minute, 45 seconds
I'd still like to keep checking for duplicates as the default
behavior, but
we could add a command line option to disable it.
Do those changes sound reasonable to you?
I won't be able to work on a patch for this until next week. In the
meantime, there's an easy hack you could do to skip over the checking
of
duplicates.
Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc. After
line
464 of that file, add "return(false);", as shown below. That will
disable
the checking of duplicates. Then recompile MET.
462
////////////////////////////////////////////////////////////////////////
463
464 bool is_dup(const char *key) {
return(false);
465 bool dup;
466
Thanks,
John Halley Gotway
met_help at ucar.edu
On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT <met_help at ucar.edu>
wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
> the file is GFS MODEL conventional file. size around 84 GB. the prob
is
> its taking around 10 hr to give an output. since once the virtual
memory
> size becomes large the process becomes slow. though I have turned
the
> verbosity to 0 still, its showing the log on the screen like
SKIPPING
> DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> ascertain the log, but didn't want to switch it off. the code is
written
> by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
CONVSTAT
> file and run gsid2mpr
> regards jagdeep
> Sent from Yahoo Mail on Android
>
> On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> met_help at ucar.edu> wrote: Harvir,
>
> I see that you're having a tough time running the gsi2mpr tool. If
not
> using "-swap" leads to that error, then you should continue using
the
> "-swap" option.
>
> Can you tell me how long it takes for the following command to run?
>
> time gsid2mpr diag_conv_ges.2015060100 -swap -outdir out -v 3
>
> If you'd like, you could send me that sample file (diag_conv_ges.
> 2015060100),
> and I could try running it here to see how long it takes.
>
> Follow these instructions to send us data:
> http://www.dtcenter.org/met/users/support/met_help.php#ftp
>
> Thanks,
> John Halley Gotway
> met_help at ucar.edu
>
>
>
>
------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Tue Jan 19 10:56:34 2016
Jagdeep,
By default, STAT-Analysis has two options enabled which slow it down a
lot. Disabling these two options on my machine make the job you sent
me
run in less than 1 minute:
(1) The computation of rank correlation statistics, Spearmans Rank
Correlation and Kendall's Tau. Disable them using "-rank_corr_flag
FALSE".
(2) The computation of bootstrap confidence intervals. Disable them
using
"-n_boot_rep 0".
I have two other suggestions...
(1) Instead of using "-fcst_var u", try using "-by fcst_var". That'll
compute statistics separately for each unique entry found in the
FCST_VAR
column.
(2) Instead of using "-out" to write the output to a text file, try
using
"-out_stat" which will write a full STAT output file, including all
the
header columns. When doing so, you'll get a long list of values in
the
OBTYPE column. To avoid that long, OBTYPE column value, you can
manually
set the output using "-set_hdr OBTYPE ALL_TYPES". Or set it's value
to
whatever you'd like.
met-5.1/bin/stat_analysis \
-lookin diag_conv_anl.2015060100.stat \
-job aggregate_stat -line_type MPR -out_line_type CNT -by FCST_VAR
\
-out_stat diag_conv_anl.2015060100_cnt.txt -set_hdr OBTYPE
ALL_TYPES \
-n_boot_rep 0 -rank_corr_flag FALSE -v 4
Adding the "-by FCST_VAR" option to compute stats for all variables
made
the job run in about 2 1/2 minutes on my machine. I've attached the
resulting output file.
You also asked why you got fewer matched pairs for u than you
expected.
Note that my counts are slightly higher because my output includes
some
duplicates. In my data, there are 447888 MPR lines for the variable
"u"
but we end up with only 444766 matched pairs. Looking at the FCST and
OBS
columns in the MPR output lines for u, I found 3122 of them where the
observation value is actually NA. Those pairs will be skipped when
computing statistics.
Hope that helps.
Thanks,
John
On Fri, Jan 15, 2016 at 9:49 AM, j singh via RT <met_help at ucar.edu>
wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
> [rt.rap.ucar.edu #74724]
> hi john
>
> your solution was good enough to shorten the output of gsid2mpr to
> incredible 15 min. it was a relief.
> there's another thing i need to ask
>
> i am using the stat_analysis tools to get cnt out_line_type, its
taking
> around 1 1/2 hr, can you once look at it .you can use the file which
is the
> output of gsid2mpr; the file which i have sent you yesterday
(diag_conv_anl.
> 2015060100), and use the following command
> stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
aggregate_stat
> -line_type MPR -out_line_type CNT -fcst_var u -out diag_conv_anl.
> 2015060100.cnt -v 2
> this is the log made during the run of above stat_analysis command
>
##############################################################################DEBUG
> 1: Creating STAT-Analysis output file "diag_conv_anl.2015060100.cnt"
> DEBUG 2: STAT Lines read = 1434170
> DEBUG 2: STAT Lines retained = 447165
> DEBUG 2:
> DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u
-line_type MPR
> -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop
> 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
> GSL_RNG_TYPE=mt19937
> GSL_RNG_SEED=145791062
> DEBUG 2: Computing output for 1 case(s).
> DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
>
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
> DEBUG 2: Job 1 used 444043 out of 447165 STAT
>
lines.##############################################################################please
> enlighten me about the line as to why only 444043 were used insted
of the
> whole 447165.
> i appreciate your help for the gsid2mpr issue and do update me if
there is
> any other fix regarding gsid2mpr.
> thanks,jagdeep
>
>
>
> On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
>
> jagdeep,
>
> Thanks for sending that sample data file. That 88 Mb file is
certainly
> much larger than the 400 Kb test data files we used during
development!
>
> After running that file on my machine for 10 minutes or so, I gave
up and
> killed it.
>
> I did some testing and found 2 things that are slowing it down a
lot...
> (1) Resizing the output object for each record it reads.
> (2) Checking for duplicates records.
>
> Below I've listed the run times when adding logic to the code to fix
these
> issues:
> - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
> - Fix (2): 6 minutes, 36 seconds
> - Both fixes (1) and (2): 1 minute, 45 seconds
>
> I'd still like to keep checking for duplicates as the default
behavior, but
> we could add a command line option to disable it.
>
> Do those changes sound reasonable to you?
>
> I won't be able to work on a patch for this until next week. In the
> meantime, there's an easy hack you could do to skip over the
checking of
> duplicates.
>
> Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc. After
line
> 464 of that file, add "return(false);", as shown below. That will
disable
> the checking of duplicates. Then recompile MET.
>
> 462
>
////////////////////////////////////////////////////////////////////////
> 463
> 464 bool is_dup(const char *key) {
> return(false);
> 465 bool dup;
> 466
>
> Thanks,
> John Halley Gotway
> met_help at ucar.edu
>
>
> On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT <met_help at ucar.edu>
> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
> >
> > the file is GFS MODEL conventional file. size around 84 GB. the
prob is
> > its taking around 10 hr to give an output. since once the virtual
memory
> > size becomes large the process becomes slow. though I have turned
the
> > verbosity to 0 still, its showing the log on the screen like
SKIPPING
> > DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> > ascertain the log, but didn't want to switch it off. the code is
written
> > by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
CONVSTAT
> > file and run gsid2mpr
> > regards jagdeep
> > Sent from Yahoo Mail on Android
> >
> > On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> > met_help at ucar.edu> wrote: Harvir,
> >
> > I see that you're having a tough time running the gsi2mpr tool.
If not
> > using "-swap" leads to that error, then you should continue using
the
> > "-swap" option.
> >
> > Can you tell me how long it takes for the following command to
run?
> >
> > time gsid2mpr diag_conv_ges.2015060100 -swap -outdir out -v 3
> >
> > If you'd like, you could send me that sample file (diag_conv_ges.
> > 2015060100),
> > and I could try running it here to see how long it takes.
> >
> > Follow these instructions to send us data:
> > http://www.dtcenter.org/met/users/support/met_help.php#ftp
> >
> > Thanks,
> > John Halley Gotway
> > met_help at ucar.edu
> >
> >
> >
> >
>
>
>
>
>
------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Tue Jan 19 10:56:34 2016
VERSION MODEL FCST_LEAD FCST_VALID_BEG FCST_VALID_END OBS_LEAD
OBS_VALID_BEG OBS_VALID_END FCST_VAR FCST_LEV OBS_VAR OBS_LEV
OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH
COV_THRESH ALPHA LINE_TYPE TOTAL FBAR FBAR_NCL FBAR_NCU
FBAR_BCL FBAR_BCU FSTDEV FSTDEV_NCL FSTDEV_NCU FSTDEV_BCL
FSTDEV_BCU OBAR OBAR_NCL OBAR_NCU OBAR_BCL OBAR_BCU
OSTDEV OSTDEV_NCL OSTDEV_NCU OSTDEV_BCL OSTDEV_BCU PR_CORR
PR_CORR_NCL PR_CORR_NCU PR_CORR_BCL PR_CORR_BCU SP_CORR KT_CORR RANKS
FRANK_TIES ORANK_TIES ME ME_NCL ME_NCU
ME_BCL ME_BCU ESTDEV ESTDEV_NCL ESTDEV_NCU ESTDEV_BCL ESTDEV_BCU
MBIAS MBIAS_BCL MBIAS_BCU MAE MAE_BCL MAE_BCU MSE
MSE_BCL MSE_BCU BCMSE BCMSE_BCL BCMSE_BCU RMSE RMSE_BCL
RMSE_BCU E10 E10_BCL E10_BCU E25 E25_BCL E25_BCU
E50 E50_BCL E50_BCU E75 E75_BCL E75_BCU E90
E90_BCL E90_BCU EIQR EIQR_BCL EIQR_BCU MAD MAD_BCL
MAD_BCU ANOM_CORR ANOM_CORR_NCL ANOM_CORR_NCU ANOM_CORR_BCL
ANOM_CORR_BCU ME2 ME2_BCL ME2_BCU MSESS
MSESS_BCL MSESS_BCU
V5.1 GSI 000000 20150601_000000 20150601_000000 000000
20150531_210000 20150601_025800 gps NA gps NA
ALL_TYPES NA NA 0 NA NA NA
0.05 CNT 188688 -200.48231 -200.86237 -200.10225 NA
NA 84.23183 83.96395 84.50144 NA NA 0.0031088
0.0030861 0.0031315 NA NA 0.0050353 0.0050193
0.0050515 NA NA -0.33551 -0.33951 -0.3315
NA NA NA NA 0 0 0 -200.48541
-200.86548 -200.10535 NA NA 84.23352 83.96563
84.50314 NA NA -64488.54341 NA NA
200.48541 NA NA 47289.64977 NA NA
7095.24838 NA NA 217.46184 NA NA
-262.42313 NA NA -248.07416 NA NA -226.02536
NA NA -208.89397 NA NA 0 NA NA
39.18019 NA NA 19.33676 NA NA NA
NA NA NA NA 40194.40139 NA
NA -1865129260.01628 NA NA
V5.1 GSI 000000 20150601_000000 20150601_000000 000000
20150531_210000 20150601_025900 ps NA ps NA
ALL_TYPES NA NA 0 NA NA NA
0.05 CNT 42613 984.96813 984.48308 985.45319 NA
NA 51.0875 50.74681 51.43282 NA NA 984.6152
984.12863 985.10177 NA NA 51.24695 50.9052
51.59335 NA NA 0.98682 0.98656 0.98706
NA NA NA NA 0 0 0 0.35293
0.27403 0.43183 NA NA 8.31024 8.25482
8.36641 NA NA 1.00036 NA NA
1.51615 NA NA 69.18302 NA NA 69.05846
NA NA 8.31763 NA NA -1.11171 NA
NA -0.5473 NA NA -0.05762 NA NA
0.46487 NA NA 1.10836 NA NA 1.01217
NA NA 0.50531 NA NA NA NA
NA NA NA 0.12456 NA NA
0.97366 NA NA
V5.1 GSI 000000 20150601_000000 20150601_000000 000000
20150531_210730 20150601_022230 pw NA pw NA
ALL_TYPES NA NA 0 NA NA NA
0.05 CNT 4840 25.54743 25.29131 25.80354 NA
NA 9.09104 8.91349 9.27587 NA NA 25.57483
25.30351 25.84616 NA NA 9.63082 9.44272
9.82662 NA NA 0.92251 0.9182 0.9266
NA NA NA NA 0 0 0 -0.02741
-0.1323 0.077478 NA NA 3.72305 3.65034
3.79874 NA NA 0.99893 NA NA
2.32547 NA NA 13.85901 NA NA 13.85826
NA NA 3.72277 NA NA -3.15543 NA
NA -1.41131 NA NA 0.07611 NA NA
1.76063 NA NA 3.62706 NA NA 3.17194
NA NA 1.5948 NA NA NA NA
NA NA NA 0.00075129 NA NA
0.85058 NA NA
V5.1 GSI 000000 20150601_000000 20150601_000000 000000
20150531_210000 20150601_025900 q NA q NA
ALL_TYPES NA NA 0 NA NA NA
0.05 CNT 77764 0.0049452 0.0049088 0.0049817 NA
NA 0.0051815 0.0051559 0.0052074 NA NA 0.0052258
0.0051866 0.0052649 NA NA 0.0055726 0.005545
0.0056004 NA NA 0.9623 0.96177 0.96281
NA NA NA NA 0 0 0
-0.00028051 -0.00029124 -0.00026978 NA NA 0.0015265
0.001519 0.0015341 NA NA 0.94632 NA
NA 0.00075422 NA NA 2.4089e-06 NA NA
2.3302e-06 NA NA 0.0015521 NA NA
-0.001616 NA NA -0.00048291 NA NA -4.1541e-06
NA NA 0.00014657 NA NA 0.00077851 NA NA
0.00062948 NA NA 0.00028845 NA NA NA
NA NA NA NA 7.8686e-08 NA
NA 0.92243 NA NA
V5.1 GSI 000000 20150601_000000 20150601_000000 000000
20150531_210000 20150601_025000 sst NA sst NA
ALL_TYPES NA NA 0 NA NA NA
0.05 CNT 123 283.48653 282.35165 284.62141 NA
NA 6.42176 5.70712 7.34273 NA NA 289.10935
287.24369 290.97501 NA NA 10.55691 9.38209
12.07092 NA NA 0.28341 0.112 0.43845
NA NA NA NA 0 0 0 -5.62282
-7.51187 -3.73377 NA NA 10.68926 9.49971
12.22225 NA NA 0.98055 NA NA
6.15393 NA NA 144.9474 NA NA 113.33134
NA NA 12.03941 NA NA -10.92461 NA
NA -6.57281 NA NA -1.6098 NA NA
-0.09239 NA NA 1.30721 NA NA 6.48042
NA NA 2.03876 NA NA NA NA
NA NA NA 31.61606 NA NA
-0.30058 NA NA
V5.1 GSI 000000 20150601_000000 20150601_000000 000000
20150531_210000 20150601_030000 t NA t NA
ALL_TYPES NA NA 0 NA NA NA
0.05 CNT 227599 256.16878 256.03818 256.29938 NA
NA 31.78918 31.6971 31.8818 NA NA 256.17193
256.04139 256.30246 NA NA 31.77428 31.68224
31.86685 NA NA 0.99821 0.99819 0.99822
NA NA NA NA 0 0 0
-0.0031443 -0.010958 0.0046697 NA NA 1.90201
1.8965 1.90755 NA NA 0.99999 NA
NA 0.9121 NA NA 3.61763 NA NA
3.61762 NA NA 1.90201 NA NA
-1.34499 NA NA -0.52925 NA NA 0.03319
NA NA 0.52779 NA NA 1.15161 NA NA
1.05704 NA NA 0.5248 NA NA NA
NA NA NA NA 9.8866e-06 NA
NA 0.99642 NA NA
V5.1 GSI 000000 20150601_000000 20150601_000000 000000
20150601_000000 20150601_000000 tcp NA tcp NA
ALL_TYPES NA NA 0 NA NA NA
0.05 CNT 2 0 0 0 NA
NA 0 0 0 NA NA 983.18842
682.82861 1283.54823 NA NA 33.43035 14.91493
1066.76792 NA NA NA NA NA
NA NA NA NA 0 0 0 -983.18842
-1283.54823 -682.82861 NA NA 33.43035 14.91493
1066.76792 NA NA 0 NA NA
983.18842 NA NA 967218.26351 NA NA
558.79428 NA NA 983.47255 NA NA
-1002.09948 NA NA -995.00784 NA NA -983.18842
NA NA -971.36901 NA NA -964.27736 NA NA
23.63883 NA NA 23.63883 NA NA 1
NA NA NA NA 966659.46922 NA
NA -864.45111 NA NA
V5.1 GSI 000000 20150601_000000 20150601_000000 000000
20150531_210000 20150601_030000 u NA u NA
ALL_TYPES NA NA 0 NA NA NA
0.05 CNT 444766 6.47657 6.43919 6.51395 NA
NA 12.71867 12.69229 12.74515 NA NA 6.22615
6.18863 6.26366 NA NA 12.76489 12.73842
12.79147 NA NA 0.92699 0.92657 0.9274
NA NA NA NA 0 0 0 0.25042
0.23611 0.26473 NA NA 4.8693 4.8592
4.87944 NA NA 1.04022 NA NA
2.39841 NA NA 23.77275 NA NA 23.71003
NA NA 4.87573 NA NA -3.00654 NA
NA -1.33594 NA NA 0.067305 NA NA
1.5193 NA NA 3.35059 NA NA 2.85524
NA NA 1.42716 NA NA NA NA
NA NA NA 0.062712 NA NA
0.8541 NA NA
V5.1 GSI 000000 20150601_000000 20150601_000000 000000
20150531_210000 20150601_030000 v NA v NA
ALL_TYPES NA NA 0 NA NA NA
0.05 CNT 444766 0.51622 0.48767 0.54476 NA
NA 9.71306 9.69292 9.73329 NA NA 0.36571
0.33634 0.39508 NA NA 9.99332 9.97259
10.01413 NA NA 0.91394 0.91345 0.91442
NA NA NA NA 0 0 0 0.15051
0.13847 0.16255 NA NA 4.09709 4.0886
4.10562 NA NA 1.41155 NA NA
2.26999 NA NA 16.80878 NA NA 16.78613
NA NA 4.09985 NA NA -3.03755 NA
NA -1.37359 NA NA 0.05279 NA NA
1.5021 NA NA 3.27482 NA NA 2.87569
NA NA 1.4373 NA NA NA NA
NA NA NA 0.022652 NA NA
0.83169 NA NA
------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: j singh
Time: Tue Jan 19 11:48:55 2016
John
I have attached a file for "u" variable .cnt file . In that file the
second column includes the total number of accepted variables since
ANLY_USE = 1. is there any possibility to even include total number of
"u" variables (ie ANLY_USE = +1 & -1 ).
Though I am able to get the total value from log file, still if there
is any possibility to get that information on the file, it would be
welcomed. I have few other questions too but i will let you know about
them later.
P.S. with every mail of yours, you are easing my work and help me
focus more on the science. I wonder what else tricks do you have up
your sleeves.
thanks
jagdeep
On Tuesday, 19 January 2016 11:26 PM, John Halley Gotway via RT
<met_help at ucar.edu> wrote:
Jagdeep,
By default, STAT-Analysis has two options enabled which slow it down a
lot. Disabling these two options on my machine make the job you sent
me
run in less than 1 minute:
(1) The computation of rank correlation statistics, Spearmans Rank
Correlation and Kendall's Tau. Disable them using "-rank_corr_flag
FALSE".
(2) The computation of bootstrap confidence intervals. Disable them
using
"-n_boot_rep 0".
I have two other suggestions...
(1) Instead of using "-fcst_var u", try using "-by fcst_var". That'll
compute statistics separately for each unique entry found in the
FCST_VAR
column.
(2) Instead of using "-out" to write the output to a text file, try
using
"-out_stat" which will write a full STAT output file, including all
the
header columns. When doing so, you'll get a long list of values in
the
OBTYPE column. To avoid that long, OBTYPE column value, you can
manually
set the output using "-set_hdr OBTYPE ALL_TYPES". Or set it's value
to
whatever you'd like.
met-5.1/bin/stat_analysis \
-lookin diag_conv_anl.2015060100.stat \
-job aggregate_stat -line_type MPR -out_line_type CNT -by FCST_VAR \
-out_stat diag_conv_anl.2015060100_cnt.txt -set_hdr OBTYPE ALL_TYPES
\
-n_boot_rep 0 -rank_corr_flag FALSE -v 4
Adding the "-by FCST_VAR" option to compute stats for all variables
made
the job run in about 2 1/2 minutes on my machine. I've attached the
resulting output file.
You also asked why you got fewer matched pairs for u than you
expected.
Note that my counts are slightly higher because my output includes
some
duplicates. In my data, there are 447888 MPR lines for the variable
"u"
but we end up with only 444766 matched pairs. Looking at the FCST and
OBS
columns in the MPR output lines for u, I found 3122 of them where the
observation value is actually NA. Those pairs will be skipped when
computing statistics.
Hope that helps.
Thanks,
John
On Fri, Jan 15, 2016 at 9:49 AM, j singh via RT <met_help at ucar.edu>
wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
> [rt.rap.ucar.edu #74724]
> hi john
>
> your solution was good enough to shorten the output of gsid2mpr to
> incredible 15 min. it was a relief.
> there's another thing i need to ask
>
> i am using the stat_analysis tools to get cnt out_line_type, its
taking
> around 1 1/2 hr, can you once look at it .you can use the file which
is the
> output of gsid2mpr; the file which i have sent you yesterday
(diag_conv_anl.
> 2015060100), and use the following command
> stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
aggregate_stat
> -line_type MPR -out_line_type CNT -fcst_var u -out diag_conv_anl.
> 2015060100.cnt -v 2
> this is the log made during the run of above stat_analysis command
>
##############################################################################DEBUG
> 1: Creating STAT-Analysis output file "diag_conv_anl.2015060100.cnt"
> DEBUG 2: STAT Lines read = 1434170
> DEBUG 2: STAT Lines retained = 447165
> DEBUG 2:
> DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u
-line_type MPR
> -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop
> 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
> GSL_RNG_TYPE=mt19937
> GSL_RNG_SEED=145791062
> DEBUG 2: Computing output for 1 case(s).
> DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
>
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
> DEBUG 2: Job 1 used 444043 out of 447165 STAT
>
lines.##############################################################################please
> enlighten me about the line as to why only 444043 were used insted
of the
> whole 447165.
> i appreciate your help for the gsid2mpr issue and do update me if
there is
> any other fix regarding gsid2mpr.
> thanks,jagdeep
>
>
>
> On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
>
> jagdeep,
>
> Thanks for sending that sample data file. That 88 Mb file is
certainly
> much larger than the 400 Kb test data files we used during
development!
>
> After running that file on my machine for 10 minutes or so, I gave
up and
> killed it.
>
> I did some testing and found 2 things that are slowing it down a
lot...
> (1) Resizing the output object for each record it reads.
> (2) Checking for duplicates records.
>
> Below I've listed the run times when adding logic to the code to fix
these
> issues:
> - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
> - Fix (2): 6 minutes, 36 seconds
> - Both fixes (1) and (2): 1 minute, 45 seconds
>
> I'd still like to keep checking for duplicates as the default
behavior, but
> we could add a command line option to disable it.
>
> Do those changes sound reasonable to you?
>
> I won't be able to work on a patch for this until next week. In the
> meantime, there's an easy hack you could do to skip over the
checking of
> duplicates.
>
> Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc. After
line
> 464 of that file, add "return(false);", as shown below. That will
disable
> the checking of duplicates. Then recompile MET.
>
> 462
>
////////////////////////////////////////////////////////////////////////
> 463
> 464 bool is_dup(const char *key) {
> return(false);
> 465 bool dup;
> 466
>
> Thanks,
> John Halley Gotway
> met_help at ucar.edu
>
>
> On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT <met_help at ucar.edu>
> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
> >
> > the file is GFS MODEL conventional file. size around 84 GB. the
prob is
> > its taking around 10 hr to give an output. since once the virtual
memory
> > size becomes large the process becomes slow. though I have turned
the
> > verbosity to 0 still, its showing the log on the screen like
SKIPPING
> > DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> > ascertain the log, but didn't want to switch it off. the code is
written
> > by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
CONVSTAT
> > file and run gsid2mpr
> > regards jagdeep
> > Sent from Yahoo Mail on Android
> >
> > On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> > met_help at ucar.edu> wrote: Harvir,
> >
> > I see that you're having a tough time running the gsi2mpr tool.
If not
> > using "-swap" leads to that error, then you should continue using
the
> > "-swap" option.
> >
> > Can you tell me how long it takes for the following command to
run?
> >
> > time gsid2mpr diag_conv_ges.2015060100 -swap -outdir out -v 3
> >
> > If you'd like, you could send me that sample file (diag_conv_ges.
> > 2015060100),
> > and I could try running it here to see how long it takes.
> >
> > Follow these instructions to send us data:
> > http://www.dtcenter.org/met/users/support/met_help.php#ftp
> >
> > Thanks,
> > John Halley Gotway
> > met_help at ucar.edu
> >
> >
> >
> >
>
>
>
>
>
------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Tue Jan 19 13:56:49 2016
Jagdeep,
I see that you'd like to differentiate between the -1 and 1 values in
the
ANLY_USE column. My only suggestion would be to add that to the "-by"
option.
For example, running with "-fcst_var u -by ANLY_USE", I get the
attached
output file. Looking in there, I see that 163,817 pairs have
ANLY_USE=-1
while 280,949 pairs have ANLY_USE=1.
FYI, you can combine multiple "-by" options together, like this: -by
FCST_VAR,ANLY_USE
Hope that helps.
John
On Tue, Jan 19, 2016 at 11:48 AM, j singh via RT <met_help at ucar.edu>
wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
> John
> I have attached a file for "u" variable .cnt file . In that file the
> second column includes the total number of accepted variables since
> ANLY_USE = 1. is there any possibility to even include total number
of "u"
> variables (ie ANLY_USE = +1 & -1 ).
> Though I am able to get the total value from log file, still if
there is
> any possibility to get that information on the file, it would be
welcomed.
> I have few other questions too but i will let you know about them
later.
>
> P.S. with every mail of yours, you are easing my work and help me
focus
> more on the science. I wonder what else tricks do you have up your
sleeves.
>
>
> thanks
> jagdeep
>
>
>
>
> On Tuesday, 19 January 2016 11:26 PM, John Halley Gotway via RT
<
> met_help at ucar.edu> wrote:
>
>
> Jagdeep,
>
> By default, STAT-Analysis has two options enabled which slow it down
a
> lot. Disabling these two options on my machine make the job you
sent me
> run in less than 1 minute:
>
> (1) The computation of rank correlation statistics, Spearmans Rank
> Correlation and Kendall's Tau. Disable them using "-rank_corr_flag
FALSE".
> (2) The computation of bootstrap confidence intervals. Disable them
using
> "-n_boot_rep 0".
>
> I have two other suggestions...
>
> (1) Instead of using "-fcst_var u", try using "-by fcst_var".
That'll
> compute statistics separately for each unique entry found in the
FCST_VAR
> column.
> (2) Instead of using "-out" to write the output to a text file, try
using
> "-out_stat" which will write a full STAT output file, including all
the
> header columns. When doing so, you'll get a long list of values in
the
> OBTYPE column. To avoid that long, OBTYPE column value, you can
manually
> set the output using "-set_hdr OBTYPE ALL_TYPES". Or set it's value
to
> whatever you'd like.
>
> met-5.1/bin/stat_analysis \
> -lookin diag_conv_anl.2015060100.stat \
> -job aggregate_stat -line_type MPR -out_line_type CNT -by FCST_VAR
\
> -out_stat diag_conv_anl.2015060100_cnt.txt -set_hdr OBTYPE
ALL_TYPES \
> -n_boot_rep 0 -rank_corr_flag FALSE -v 4
>
> Adding the "-by FCST_VAR" option to compute stats for all variables
made
> the job run in about 2 1/2 minutes on my machine. I've attached the
> resulting output file.
>
> You also asked why you got fewer matched pairs for u than you
expected.
> Note that my counts are slightly higher because my output includes
some
> duplicates. In my data, there are 447888 MPR lines for the variable
"u"
> but we end up with only 444766 matched pairs. Looking at the FCST
and OBS
> columns in the MPR output lines for u, I found 3122 of them where
the
> observation value is actually NA. Those pairs will be skipped when
> computing statistics.
>
> Hope that helps.
>
> Thanks,
> John
>
>
> On Fri, Jan 15, 2016 at 9:49 AM, j singh via RT <met_help at ucar.edu>
wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
> >
> > [rt.rap.ucar.edu #74724]
> > hi john
> >
> > your solution was good enough to shorten the output of gsid2mpr to
> > incredible 15 min. it was a relief.
> > there's another thing i need to ask
> >
> > i am using the stat_analysis tools to get cnt out_line_type, its
taking
> > around 1 1/2 hr, can you once look at it .you can use the file
which is
> the
> > output of gsid2mpr; the file which i have sent you yesterday
> (diag_conv_anl.
> > 2015060100), and use the following command
> > stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
> aggregate_stat
> > -line_type MPR -out_line_type CNT -fcst_var u -out diag_conv_anl.
> > 2015060100.cnt -v 2
> > this is the log made during the run of above stat_analysis command
> >
>
##############################################################################DEBUG
> > 1: Creating STAT-Analysis output file
"diag_conv_anl.2015060100.cnt"
> > DEBUG 2: STAT Lines read = 1434170
> > DEBUG 2: STAT Lines retained = 447165
> > DEBUG 2:
> > DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u
-line_type MPR
> > -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop
> > 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
> > GSL_RNG_TYPE=mt19937
> > GSL_RNG_SEED=145791062
> > DEBUG 2: Computing output for 1 case(s).
> > DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
> >
>
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
> > DEBUG 2: Job 1 used 444043 out of 447165 STAT
> >
>
lines.##############################################################################please
> > enlighten me about the line as to why only 444043 were used insted
of the
> > whole 447165.
> > i appreciate your help for the gsid2mpr issue and do update me if
there
> is
> > any other fix regarding gsid2mpr.
> > thanks,jagdeep
> >
> >
> >
> > On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> >
> >
> > jagdeep,
> >
> > Thanks for sending that sample data file. That 88 Mb file is
certainly
> > much larger than the 400 Kb test data files we used during
development!
> >
> > After running that file on my machine for 10 minutes or so, I gave
up and
> > killed it.
> >
> > I did some testing and found 2 things that are slowing it down a
lot...
> > (1) Resizing the output object for each record it reads.
> > (2) Checking for duplicates records.
> >
> > Below I've listed the run times when adding logic to the code to
fix
> these
> > issues:
> > - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
> > - Fix (2): 6 minutes, 36 seconds
> > - Both fixes (1) and (2): 1 minute, 45 seconds
> >
> > I'd still like to keep checking for duplicates as the default
behavior,
> but
> > we could add a command line option to disable it.
> >
> > Do those changes sound reasonable to you?
> >
> > I won't be able to work on a patch for this until next week. In
the
> > meantime, there's an easy hack you could do to skip over the
checking of
> > duplicates.
> >
> > Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc.
After line
> > 464 of that file, add "return(false);", as shown below. That will
> disable
> > the checking of duplicates. Then recompile MET.
> >
> > 462
> >
////////////////////////////////////////////////////////////////////////
> > 463
> > 464 bool is_dup(const char *key) {
> > return(false);
> > 465 bool dup;
> > 466
> >
> > Thanks,
> > John Halley Gotway
> > met_help at ucar.edu
> >
> >
> > On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT
<met_help at ucar.edu>
> > wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
> > >
> > > the file is GFS MODEL conventional file. size around 84 GB. the
prob is
> > > its taking around 10 hr to give an output. since once the
virtual
> memory
> > > size becomes large the process becomes slow. though I have
turned the
> > > verbosity to 0 still, its showing the log on the screen like
SKIPPING
> > > DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> > > ascertain the log, but didn't want to switch it off. the code is
> written
> > > by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
> CONVSTAT
> > > file and run gsid2mpr
> > > regards jagdeep
> > > Sent from Yahoo Mail on Android
> > >
> > > On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> > > met_help at ucar.edu> wrote: Harvir,
> > >
> > > I see that you're having a tough time running the gsi2mpr tool.
If not
> > > using "-swap" leads to that error, then you should continue
using the
> > > "-swap" option.
> > >
> > > Can you tell me how long it takes for the following command to
run?
> > >
> > > time gsid2mpr diag_conv_ges.2015060100 -swap -outdir out -v 3
> > >
> > > If you'd like, you could send me that sample file
(diag_conv_ges.
> > > 2015060100),
> > > and I could try running it here to see how long it takes.
> > >
> > > Follow these instructions to send us data:
> > > http://www.dtcenter.org/met/users/support/met_help.php#ftp
> > >
> > > Thanks,
> > > John Halley Gotway
> > > met_help at ucar.edu
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
>
>
>
------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Wed Jan 20 13:07:27 2016
Jagdeep,
I wanted to let you know that we talked about these enhancements to
the MET
GSI diagnostic tools internally. Unfortunately, our funding is very
limited on MET until at least March 1st. So I won't be able to
provide you
with an updated version until at least March.
I can tell you that we'll definitely include these enhancement in the
next
release of MET.
So for now, I'll resolve this ticket. Unless you have additional
questions.
Thanks,
John
On Tue, Jan 19, 2016 at 1:56 PM, John Halley Gotway <johnhg at ucar.edu>
wrote:
> Jagdeep,
>
> I see that you'd like to differentiate between the -1 and 1 values
in the
> ANLY_USE column. My only suggestion would be to add that to the "-
by"
> option.
>
> For example, running with "-fcst_var u -by ANLY_USE", I get the
attached
> output file. Looking in there, I see that 163,817 pairs have
ANLY_USE=-1
> while 280,949 pairs have ANLY_USE=1.
>
> FYI, you can combine multiple "-by" options together, like this: -by
> FCST_VAR,ANLY_USE
>
> Hope that helps.
>
> John
>
>
> On Tue, Jan 19, 2016 at 11:48 AM, j singh via RT <met_help at ucar.edu>
> wrote:
>
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>>
>> John
>> I have attached a file for "u" variable .cnt file . In that file
the
>> second column includes the total number of accepted variables since
>> ANLY_USE = 1. is there any possibility to even include total number
of "u"
>> variables (ie ANLY_USE = +1 & -1 ).
>> Though I am able to get the total value from log file, still if
there is
>> any possibility to get that information on the file, it would be
welcomed.
>> I have few other questions too but i will let you know about them
later.
>>
>> P.S. with every mail of yours, you are easing my work and help me
focus
>> more on the science. I wonder what else tricks do you have up your
sleeves.
>>
>>
>> thanks
>> jagdeep
>>
>>
>>
>>
>> On Tuesday, 19 January 2016 11:26 PM, John Halley Gotway via RT
<
>> met_help at ucar.edu> wrote:
>>
>>
>> Jagdeep,
>>
>> By default, STAT-Analysis has two options enabled which slow it
down a
>> lot. Disabling these two options on my machine make the job you
sent me
>> run in less than 1 minute:
>>
>> (1) The computation of rank correlation statistics, Spearmans Rank
>> Correlation and Kendall's Tau. Disable them using "-rank_corr_flag
>> FALSE".
>> (2) The computation of bootstrap confidence intervals. Disable
them using
>> "-n_boot_rep 0".
>>
>> I have two other suggestions...
>>
>> (1) Instead of using "-fcst_var u", try using "-by fcst_var".
That'll
>> compute statistics separately for each unique entry found in the
FCST_VAR
>> column.
>> (2) Instead of using "-out" to write the output to a text file, try
using
>> "-out_stat" which will write a full STAT output file, including all
the
>> header columns. When doing so, you'll get a long list of values in
the
>> OBTYPE column. To avoid that long, OBTYPE column value, you can
manually
>> set the output using "-set_hdr OBTYPE ALL_TYPES". Or set it's
value to
>> whatever you'd like.
>>
>> met-5.1/bin/stat_analysis \
>> -lookin diag_conv_anl.2015060100.stat \
>> -job aggregate_stat -line_type MPR -out_line_type CNT -by
FCST_VAR \
>> -out_stat diag_conv_anl.2015060100_cnt.txt -set_hdr OBTYPE
ALL_TYPES \
>> -n_boot_rep 0 -rank_corr_flag FALSE -v 4
>>
>> Adding the "-by FCST_VAR" option to compute stats for all variables
made
>> the job run in about 2 1/2 minutes on my machine. I've attached
the
>> resulting output file.
>>
>> You also asked why you got fewer matched pairs for u than you
expected.
>> Note that my counts are slightly higher because my output includes
some
>> duplicates. In my data, there are 447888 MPR lines for the
variable "u"
>> but we end up with only 444766 matched pairs. Looking at the FCST
and OBS
>> columns in the MPR output lines for u, I found 3122 of them where
the
>> observation value is actually NA. Those pairs will be skipped when
>> computing statistics.
>>
>> Hope that helps.
>>
>> Thanks,
>> John
>>
>>
>> On Fri, Jan 15, 2016 at 9:49 AM, j singh via RT <met_help at ucar.edu>
>> wrote:
>>
>> >
>> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>> >
>> > [rt.rap.ucar.edu #74724]
>> > hi john
>> >
>> > your solution was good enough to shorten the output of gsid2mpr
to
>> > incredible 15 min. it was a relief.
>> > there's another thing i need to ask
>> >
>> > i am using the stat_analysis tools to get cnt out_line_type, its
taking
>> > around 1 1/2 hr, can you once look at it .you can use the file
which is
>> the
>> > output of gsid2mpr; the file which i have sent you yesterday
>> (diag_conv_anl.
>> > 2015060100), and use the following command
>> > stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
>> aggregate_stat
>> > -line_type MPR -out_line_type CNT -fcst_var u -out diag_conv_anl.
>> > 2015060100.cnt -v 2
>> > this is the log made during the run of above stat_analysis
command
>> >
>>
##############################################################################DEBUG
>> > 1: Creating STAT-Analysis output file
"diag_conv_anl.2015060100.cnt"
>> > DEBUG 2: STAT Lines read = 1434170
>> > DEBUG 2: STAT Lines retained = 447165
>> > DEBUG 2:
>> > DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u
-line_type
>> MPR
>> > -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop
>> > 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
>> > GSL_RNG_TYPE=mt19937
>> > GSL_RNG_SEED=145791062
>> > DEBUG 2: Computing output for 1 case(s).
>> > DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
>> >
>>
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
>> > DEBUG 2: Job 1 used 444043 out of 447165 STAT
>> >
>>
lines.##############################################################################please
>> > enlighten me about the line as to why only 444043 were used
insted of
>> the
>> > whole 447165.
>> > i appreciate your help for the gsid2mpr issue and do update me if
there
>> is
>> > any other fix regarding gsid2mpr.
>> > thanks,jagdeep
>> >
>> >
>> >
>> > On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT
<
>> > met_help at ucar.edu> wrote:
>> >
>> >
>> > jagdeep,
>> >
>> > Thanks for sending that sample data file. That 88 Mb file is
certainly
>> > much larger than the 400 Kb test data files we used during
development!
>> >
>> > After running that file on my machine for 10 minutes or so, I
gave up
>> and
>> > killed it.
>> >
>> > I did some testing and found 2 things that are slowing it down a
lot...
>> > (1) Resizing the output object for each record it reads.
>> > (2) Checking for duplicates records.
>> >
>> > Below I've listed the run times when adding logic to the code to
fix
>> these
>> > issues:
>> > - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
>> > - Fix (2): 6 minutes, 36 seconds
>> > - Both fixes (1) and (2): 1 minute, 45 seconds
>> >
>> > I'd still like to keep checking for duplicates as the default
behavior,
>> but
>> > we could add a command line option to disable it.
>> >
>> > Do those changes sound reasonable to you?
>> >
>> > I won't be able to work on a patch for this until next week. In
the
>> > meantime, there's an easy hack you could do to skip over the
checking of
>> > duplicates.
>> >
>> > Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc.
After line
>> > 464 of that file, add "return(false);", as shown below. That
will
>> disable
>> > the checking of duplicates. Then recompile MET.
>> >
>> > 462
>> >
////////////////////////////////////////////////////////////////////////
>> > 463
>> > 464 bool is_dup(const char *key) {
>> > return(false);
>> > 465 bool dup;
>> > 466
>> >
>> > Thanks,
>> > John Halley Gotway
>> > met_help at ucar.edu
>> >
>> >
>> > On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT
<met_help at ucar.edu>
>> > wrote:
>> >
>> > >
>> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>> > >
>> > > the file is GFS MODEL conventional file. size around 84 GB. the
prob
>> is
>> > > its taking around 10 hr to give an output. since once the
virtual
>> memory
>> > > size becomes large the process becomes slow. though I have
turned the
>> > > verbosity to 0 still, its showing the log on the screen like
SKIPPING
>> > > DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too
to
>> > > ascertain the log, but didn't want to switch it off. the code
is
>> written
>> > > by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
>> CONVSTAT
>> > > file and run gsid2mpr
>> > > regards jagdeep
>> > > Sent from Yahoo Mail on Android
>> > >
>> > > On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
>> > > met_help at ucar.edu> wrote: Harvir,
>> > >
>> > > I see that you're having a tough time running the gsi2mpr tool.
If
>> not
>> > > using "-swap" leads to that error, then you should continue
using the
>> > > "-swap" option.
>> > >
>> > > Can you tell me how long it takes for the following command to
run?
>> > >
>> > > time gsid2mpr diag_conv_ges.2015060100 -swap -outdir out -v
3
>> > >
>> > > If you'd like, you could send me that sample file
(diag_conv_ges.
>> > > 2015060100),
>> > > and I could try running it here to see how long it takes.
>> > >
>> > > Follow these instructions to send us data:
>> > > http://www.dtcenter.org/met/users/support/met_help.php#ftp
>> > >
>> > > Thanks,
>> > > John Halley Gotway
>> > > met_help at ucar.edu
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
------------------------------------------------
More information about the Met_help
mailing list