[Met_help] [rt.rap.ucar.edu #74724] History for gsid2mpr (gsi tool) problem

John Halley Gotway via RT met_help at ucar.edu
Wed Jan 20 13:18:27 MST 2016


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

hii am encountering the problem of gsid2mpr taking much time to give output.the following command i am using 

$ gsid2mpr diag_conv_ges.2015060100  -swap  -outdir out -v 3
if i am not using the -swap , the following error i get
DEBUG 1:
DEBUG 1: Reading: diag_conv_anl.20150606_1 ... 1 of 1
ERROR  :
ERROR  :
ERROR  :   read_fortran_binary() -> buffer too small ... increase buffer size to at least 67108864 bytes!
ERROR  :   Try using the -swap option to switch the endianness of the input binary files.
ERROR  :

regardsjagdeep singh 



----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Thu Jan 14 09:55:05 2016

Harvir,

I see that you're having a tough time running the gsi2mpr tool.  If
not
using "-swap" leads to that error, then you should continue using the
"-swap" option.

Can you tell me how long it takes for the following command to run?

   time gsid2mpr diag_conv_ges.2015060100  -swap  -outdir out -v 3

If you'd like, you could send me that sample file
(diag_conv_ges.2015060100),
and I could try running it here to see how long it takes.

Follow these instructions to send us data:
   http://www.dtcenter.org/met/users/support/met_help.php#ftp

Thanks,
John Halley Gotway
met_help at ucar.edu

------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: j singh
Time: Thu Jan 14 12:06:23 2016

the file is GFS MODEL conventional file. size around 84 GB. the prob
is its taking around 10 hr to give an output. since once the virtual
memory size becomes large the process becomes slow. though I have
turned the verbosity to 0 still, its showing the log on the screen
like SKIPPING DUPLICATE VALUE. I have gone through the gsid2mpr.cc
file too to 
ascertain the log, but didn't want to switch it off. the code is
written by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT
say CONVSTAT file and run gsid2mpr
regards jagdeep 
Sent from Yahoo Mail on Android

  On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via
RT<met_help at ucar.edu> wrote:   Harvir,

I see that you're having a tough time running the gsi2mpr tool.  If
not
using "-swap" leads to that error, then you should continue using the
"-swap" option.

Can you tell me how long it takes for the following command to run?

  time gsid2mpr diag_conv_ges.2015060100  -swap  -outdir out -v 3

If you'd like, you could send me that sample file
(diag_conv_ges.2015060100),
and I could try running it here to see how long it takes.

Follow these instructions to send us data:
  http://www.dtcenter.org/met/users/support/met_help.php#ftp

Thanks,
John Halley Gotway
met_help at ucar.edu



------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Thu Jan 14 15:03:29 2016

jagdeep,

Thanks for sending that sample data file.  That 88 Mb file is
certainly
much larger than the 400 Kb test data files we used during
development!

After running that file on my machine for 10 minutes or so, I gave up
and
killed it.

I did some testing and found 2 things that are slowing it down a
lot...
   (1) Resizing the output object for each record it reads.
   (2) Checking for duplicates records.

Below I've listed the run times when adding logic to the code to fix
these
issues:
   - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
   - Fix (2): 6 minutes, 36 seconds
   - Both fixes (1) and (2): 1 minute, 45 seconds

I'd still like to keep checking for duplicates as the default
behavior, but
we could add a command line option to disable it.

Do those changes sound reasonable to you?

I won't be able to work on a patch for this until next week.  In the
meantime, there's an easy hack you could do to skip over the checking
of
duplicates.

Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc.  After
line
464 of that file, add "return(false);", as shown below.  That will
disable
the checking of duplicates.  Then recompile MET.

    462
////////////////////////////////////////////////////////////////////////
    463
    464 bool is_dup(const char *key) {
             return(false);
    465    bool dup;
    466

Thanks,
John Halley Gotway
met_help at ucar.edu


On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT <met_help at ucar.edu>
wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
> the file is GFS MODEL conventional file. size around 84 GB. the prob
is
> its taking around 10 hr to give an output. since once the virtual
memory
> size becomes large the process becomes slow. though I have turned
the
> verbosity to 0 still, its showing the log on the screen like
SKIPPING
> DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> ascertain the log, but didn't want to switch it off. the code is
written
> by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
CONVSTAT
> file and run gsid2mpr
> regards jagdeep
> Sent from Yahoo Mail on Android
>
>   On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> met_help at ucar.edu> wrote:   Harvir,
>
> I see that you're having a tough time running the gsi2mpr tool.  If
not
> using "-swap" leads to that error, then you should continue using
the
> "-swap" option.
>
> Can you tell me how long it takes for the following command to run?
>
>   time gsid2mpr diag_conv_ges.2015060100  -swap  -outdir out -v 3
>
> If you'd like, you could send me that sample file (diag_conv_ges.
> 2015060100),
> and I could try running it here to see how long it takes.
>
> Follow these instructions to send us data:
>   http://www.dtcenter.org/met/users/support/met_help.php#ftp
>
> Thanks,
> John Halley Gotway
> met_help at ucar.edu
>
>
>
>

------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: j singh
Time: Fri Jan 15 09:49:23 2016

 [rt.rap.ucar.edu #74724]
hi john

your solution was good enough to shorten the output of gsid2mpr to
incredible 15 min. it was a relief.
there's another thing i need to ask

i am using the stat_analysis tools to get cnt out_line_type, its
taking around 1 1/2 hr, can you once look at it .you can use the file
which is the output of gsid2mpr; the file which i have sent you
yesterday (diag_conv_anl.2015060100), and use the following command
stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
aggregate_stat -line_type MPR -out_line_type CNT -fcst_var u -out
diag_conv_anl.2015060100.cnt  -v 2
this is the log made during the run of above stat_analysis command
##############################################################################DEBUG
1: Creating STAT-Analysis output file "diag_conv_anl.2015060100.cnt"
DEBUG 2: STAT Lines read     = 1434170
DEBUG 2: STAT Lines retained = 447165
DEBUG 2:
DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u -line_type
MPR -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
GSL_RNG_TYPE=mt19937
GSL_RNG_SEED=145791062
DEBUG 2: Computing output for 1 case(s).
DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
DEBUG 2: Job 1 used 444043 out of 447165 STAT
lines.##############################################################################please
enlighten me about the line as to why only 444043 were used insted of
the whole 447165.
i appreciate your help for the gsid2mpr issue and do update me if
there is any other fix regarding gsid2mpr.
thanks,jagdeep



    On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT
<met_help at ucar.edu> wrote:


 jagdeep,

Thanks for sending that sample data file.  That 88 Mb file is
certainly
much larger than the 400 Kb test data files we used during
development!

After running that file on my machine for 10 minutes or so, I gave up
and
killed it.

I did some testing and found 2 things that are slowing it down a
lot...
  (1) Resizing the output object for each record it reads.
  (2) Checking for duplicates records.

Below I've listed the run times when adding logic to the code to fix
these
issues:
  - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
  - Fix (2): 6 minutes, 36 seconds
  - Both fixes (1) and (2): 1 minute, 45 seconds

I'd still like to keep checking for duplicates as the default
behavior, but
we could add a command line option to disable it.

Do those changes sound reasonable to you?

I won't be able to work on a patch for this until next week.  In the
meantime, there's an easy hack you could do to skip over the checking
of
duplicates.

Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc.  After
line
464 of that file, add "return(false);", as shown below.  That will
disable
the checking of duplicates.  Then recompile MET.

    462
////////////////////////////////////////////////////////////////////////
    463
    464 bool is_dup(const char *key) {
            return(false);
    465    bool dup;
    466

Thanks,
John Halley Gotway
met_help at ucar.edu


On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT <met_help at ucar.edu>
wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
> the file is GFS MODEL conventional file. size around 84 GB. the prob
is
> its taking around 10 hr to give an output. since once the virtual
memory
> size becomes large the process becomes slow. though I have turned
the
> verbosity to 0 still, its showing the log on the screen like
SKIPPING
> DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> ascertain the log, but didn't want to switch it off. the code is
written
> by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
CONVSTAT
> file and run gsid2mpr
> regards jagdeep
> Sent from Yahoo Mail on Android
>
>  On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> met_help at ucar.edu> wrote:  Harvir,
>
> I see that you're having a tough time running the gsi2mpr tool.  If
not
> using "-swap" leads to that error, then you should continue using
the
> "-swap" option.
>
> Can you tell me how long it takes for the following command to run?
>
>  time gsid2mpr diag_conv_ges.2015060100  -swap  -outdir out -v 3
>
> If you'd like, you could send me that sample file (diag_conv_ges.
> 2015060100),
> and I could try running it here to see how long it takes.
>
> Follow these instructions to send us data:
>  http://www.dtcenter.org/met/users/support/met_help.php#ftp
>
> Thanks,
> John Halley Gotway
> met_help at ucar.edu
>
>
>
>




------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Tue Jan 19 10:56:34 2016

Jagdeep,

By default, STAT-Analysis has two options enabled which slow it down a
lot.  Disabling these two options on my machine make the job you sent
me
run in less than 1 minute:

(1) The computation of rank correlation statistics, Spearmans Rank
Correlation and Kendall's Tau.  Disable them using "-rank_corr_flag
FALSE".
(2) The computation of bootstrap confidence intervals.  Disable them
using
"-n_boot_rep 0".

I have two other suggestions...

(1) Instead of using "-fcst_var u", try using "-by fcst_var".  That'll
compute statistics separately for each unique entry found in the
FCST_VAR
column.
(2) Instead of using "-out" to write the output to a text file, try
using
"-out_stat" which will write a full STAT output file, including all
the
header columns.  When doing so, you'll get a long list of values in
the
OBTYPE column.  To avoid that long, OBTYPE column value, you can
manually
set the output using "-set_hdr OBTYPE ALL_TYPES".  Or set it's value
to
whatever you'd like.

met-5.1/bin/stat_analysis \
   -lookin diag_conv_anl.2015060100.stat \
   -job aggregate_stat -line_type MPR -out_line_type CNT -by FCST_VAR
\
   -out_stat diag_conv_anl.2015060100_cnt.txt -set_hdr OBTYPE
ALL_TYPES \
   -n_boot_rep 0 -rank_corr_flag FALSE -v 4

Adding the "-by FCST_VAR" option to compute stats for all variables
made
the job run in about 2 1/2 minutes on my machine.  I've attached the
resulting output file.

You also asked why you got fewer matched pairs for u than you
expected.
Note that my counts are slightly higher because my output includes
some
duplicates.  In my data, there are 447888 MPR lines for the variable
"u"
but we end up with only 444766 matched pairs.  Looking at the FCST and
OBS
columns in the MPR output lines for u, I found 3122 of them where the
observation value is actually NA.  Those pairs will be skipped when
computing statistics.

Hope that helps.

Thanks,
John


On Fri, Jan 15, 2016 at 9:49 AM, j singh via RT <met_help at ucar.edu>
wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
>  [rt.rap.ucar.edu #74724]
> hi john
>
> your solution was good enough to shorten the output of gsid2mpr to
> incredible 15 min. it was a relief.
> there's another thing i need to ask
>
> i am using the stat_analysis tools to get cnt out_line_type, its
taking
> around 1 1/2 hr, can you once look at it .you can use the file which
is the
> output of gsid2mpr; the file which i have sent you yesterday
(diag_conv_anl.
> 2015060100), and use the following command
> stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
aggregate_stat
> -line_type MPR -out_line_type CNT -fcst_var u -out diag_conv_anl.
> 2015060100.cnt  -v 2
> this is the log made during the run of above stat_analysis command
>
##############################################################################DEBUG
> 1: Creating STAT-Analysis output file "diag_conv_anl.2015060100.cnt"
> DEBUG 2: STAT Lines read     = 1434170
> DEBUG 2: STAT Lines retained = 447165
> DEBUG 2:
> DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u
-line_type MPR
> -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop
> 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
> GSL_RNG_TYPE=mt19937
> GSL_RNG_SEED=145791062
> DEBUG 2: Computing output for 1 case(s).
> DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
>
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
> DEBUG 2: Job 1 used 444043 out of 447165 STAT
>
lines.##############################################################################please
> enlighten me about the line as to why only 444043 were used insted
of the
> whole 447165.
> i appreciate your help for the gsid2mpr issue and do update me if
there is
> any other fix regarding gsid2mpr.
> thanks,jagdeep
>
>
>
>     On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
>
>  jagdeep,
>
> Thanks for sending that sample data file.  That 88 Mb file is
certainly
> much larger than the 400 Kb test data files we used during
development!
>
> After running that file on my machine for 10 minutes or so, I gave
up and
> killed it.
>
> I did some testing and found 2 things that are slowing it down a
lot...
>   (1) Resizing the output object for each record it reads.
>   (2) Checking for duplicates records.
>
> Below I've listed the run times when adding logic to the code to fix
these
> issues:
>   - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
>   - Fix (2): 6 minutes, 36 seconds
>   - Both fixes (1) and (2): 1 minute, 45 seconds
>
> I'd still like to keep checking for duplicates as the default
behavior, but
> we could add a command line option to disable it.
>
> Do those changes sound reasonable to you?
>
> I won't be able to work on a patch for this until next week.  In the
> meantime, there's an easy hack you could do to skip over the
checking of
> duplicates.
>
> Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc.  After
line
> 464 of that file, add "return(false);", as shown below.  That will
disable
> the checking of duplicates.  Then recompile MET.
>
>     462
>
////////////////////////////////////////////////////////////////////////
>     463
>     464 bool is_dup(const char *key) {
>             return(false);
>     465    bool dup;
>     466
>
> Thanks,
> John Halley Gotway
> met_help at ucar.edu
>
>
> On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT <met_help at ucar.edu>
> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
> >
> > the file is GFS MODEL conventional file. size around 84 GB. the
prob is
> > its taking around 10 hr to give an output. since once the virtual
memory
> > size becomes large the process becomes slow. though I have turned
the
> > verbosity to 0 still, its showing the log on the screen like
SKIPPING
> > DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> > ascertain the log, but didn't want to switch it off. the code is
written
> > by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
CONVSTAT
> > file and run gsid2mpr
> > regards jagdeep
> > Sent from Yahoo Mail on Android
> >
> >  On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> > met_help at ucar.edu> wrote:  Harvir,
> >
> > I see that you're having a tough time running the gsi2mpr tool.
If not
> > using "-swap" leads to that error, then you should continue using
the
> > "-swap" option.
> >
> > Can you tell me how long it takes for the following command to
run?
> >
> >  time gsid2mpr diag_conv_ges.2015060100  -swap  -outdir out -v 3
> >
> > If you'd like, you could send me that sample file (diag_conv_ges.
> > 2015060100),
> > and I could try running it here to see how long it takes.
> >
> > Follow these instructions to send us data:
> >  http://www.dtcenter.org/met/users/support/met_help.php#ftp
> >
> > Thanks,
> > John Halley Gotway
> > met_help at ucar.edu
> >
> >
> >
> >
>
>
>
>
>

------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Tue Jan 19 10:56:34 2016

VERSION MODEL FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD
OBS_VALID_BEG   OBS_VALID_END   FCST_VAR FCST_LEV OBS_VAR OBS_LEV
OBTYPE    VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH
COV_THRESH ALPHA LINE_TYPE TOTAL  FBAR         FBAR_NCL     FBAR_NCU
FBAR_BCL FBAR_BCU FSTDEV     FSTDEV_NCL FSTDEV_NCU FSTDEV_BCL
FSTDEV_BCU OBAR        OBAR_NCL    OBAR_NCU     OBAR_BCL OBAR_BCU
OSTDEV     OSTDEV_NCL OSTDEV_NCU   OSTDEV_BCL OSTDEV_BCU PR_CORR
PR_CORR_NCL PR_CORR_NCU PR_CORR_BCL PR_CORR_BCU SP_CORR KT_CORR RANKS
FRANK_TIES ORANK_TIES ME            ME_NCL         ME_NCU
ME_BCL ME_BCU ESTDEV     ESTDEV_NCL ESTDEV_NCU   ESTDEV_BCL ESTDEV_BCU
MBIAS        MBIAS_BCL MBIAS_BCU MAE          MAE_BCL MAE_BCU MSE
MSE_BCL MSE_BCU BCMSE         BCMSE_BCL BCMSE_BCU RMSE        RMSE_BCL
RMSE_BCU E10          E10_BCL E10_BCU E25           E25_BCL E25_BCU
E50           E50_BCL E50_BCU E75           E75_BCL E75_BCU E90
E90_BCL E90_BCU EIQR        EIQR_BCL EIQR_BCU MAD         MAD_BCL
MAD_BCU ANOM_CORR ANOM_CORR_NCL ANOM_CORR_NCU ANOM_CORR_BCL
ANOM_CORR_BCU ME2             ME2_BCL ME2_BCU MSESS
MSESS_BCL MSESS_BCU
V5.1    GSI   000000    20150601_000000 20150601_000000 000000
20150531_210000 20150601_025800 gps      NA       gps     NA
ALL_TYPES NA      NA          0           NA          NA         NA
0.05  CNT       188688 -200.48231   -200.86237   -200.10225         NA
NA 84.23183   83.96395   84.50144           NA         NA   0.0031088
0.0030861    0.0031315       NA       NA  0.0050353  0.0050193
0.0050515         NA         NA -0.33551    -0.33951    -0.3315
NA          NA      NA      NA     0          0          0 -200.48541
-200.86548    -200.10535        NA     NA 84.23352    83.96563
84.50314           NA         NA -64488.54341        NA        NA
200.48541         NA      NA  47289.64977         NA      NA
7095.24838           NA        NA 217.46184         NA       NA
-262.42313       NA      NA -248.07416         NA      NA -226.02536
NA      NA -208.89397         NA      NA    0               NA      NA
39.18019          NA       NA 19.33676         NA      NA        NA
NA            NA            NA            NA  40194.40139         NA
NA -1865129260.01628        NA        NA
V5.1    GSI   000000    20150601_000000 20150601_000000 000000
20150531_210000 20150601_025900 ps       NA       ps      NA
ALL_TYPES NA      NA          0           NA          NA         NA
0.05  CNT        42613  984.96813    984.48308    985.45319         NA
NA 51.0875    50.74681   51.43282           NA         NA 984.6152
984.12863    985.10177         NA       NA 51.24695   50.9052
51.59335           NA         NA  0.98682     0.98656     0.98706
NA          NA      NA      NA     0          0          0    0.35293
0.27403       0.43183        NA     NA  8.31024     8.25482
8.36641           NA         NA      1.00036        NA        NA
1.51615         NA      NA     69.18302         NA      NA   69.05846
NA        NA   8.31763         NA       NA    -1.11171       NA
NA   -0.5473          NA      NA   -0.05762         NA      NA
0.46487         NA      NA    1.10836         NA      NA  1.01217
NA       NA  0.50531         NA      NA        NA            NA
NA            NA            NA      0.12456         NA      NA
0.97366        NA        NA
V5.1    GSI   000000    20150601_000000 20150601_000000 000000
20150531_210730 20150601_022230 pw       NA       pw      NA
ALL_TYPES NA      NA          0           NA          NA         NA
0.05  CNT         4840   25.54743     25.29131     25.80354         NA
NA  9.09104    8.91349    9.27587           NA         NA  25.57483
25.30351     25.84616         NA       NA  9.63082    9.44272
9.82662           NA         NA  0.92251     0.9182      0.9266
NA          NA      NA      NA     0          0          0   -0.02741
-0.1323        0.077478       NA     NA  3.72305     3.65034
3.79874           NA         NA      0.99893        NA        NA
2.32547         NA      NA     13.85901         NA      NA   13.85826
NA        NA   3.72277         NA       NA    -3.15543       NA
NA   -1.41131         NA      NA    0.07611         NA      NA
1.76063         NA      NA    3.62706         NA      NA  3.17194
NA       NA  1.5948          NA      NA        NA            NA
NA            NA            NA      0.00075129      NA      NA
0.85058        NA        NA
V5.1    GSI   000000    20150601_000000 20150601_000000 000000
20150531_210000 20150601_025900 q        NA       q       NA
ALL_TYPES NA      NA          0           NA          NA         NA
0.05  CNT        77764    0.0049452    0.0049088    0.0049817       NA
NA  0.0051815  0.0051559  0.0052074         NA         NA   0.0052258
0.0051866    0.0052649       NA       NA  0.0055726  0.005545
0.0056004         NA         NA  0.9623      0.96177     0.96281
NA          NA      NA      NA     0          0          0
-0.00028051    -0.00029124   -0.00026978     NA     NA  0.0015265
0.001519    0.0015341         NA         NA      0.94632        NA
NA   0.00075422      NA      NA      2.4089e-06      NA      NA
2.3302e-06        NA        NA   0.0015521       NA       NA
-0.001616      NA      NA   -0.00048291      NA      NA   -4.1541e-06
NA      NA    0.00014657      NA      NA    0.00077851      NA      NA
0.00062948       NA       NA  0.00028845      NA      NA        NA
NA            NA            NA            NA      7.8686e-08      NA
NA           0.92243        NA        NA
V5.1    GSI   000000    20150601_000000 20150601_000000 000000
20150531_210000 20150601_025000 sst      NA       sst     NA
ALL_TYPES NA      NA          0           NA          NA         NA
0.05  CNT          123  283.48653    282.35165    284.62141         NA
NA  6.42176    5.70712    7.34273           NA         NA 289.10935
287.24369    290.97501         NA       NA 10.55691    9.38209
12.07092           NA         NA  0.28341     0.112       0.43845
NA          NA      NA      NA     0          0          0   -5.62282
-7.51187      -3.73377        NA     NA 10.68926     9.49971
12.22225           NA         NA      0.98055        NA        NA
6.15393         NA      NA    144.9474          NA      NA  113.33134
NA        NA  12.03941         NA       NA   -10.92461       NA
NA   -6.57281         NA      NA   -1.6098          NA      NA
-0.09239         NA      NA    1.30721         NA      NA  6.48042
NA       NA  2.03876         NA      NA        NA            NA
NA            NA            NA     31.61606         NA      NA
-0.30058        NA        NA
V5.1    GSI   000000    20150601_000000 20150601_000000 000000
20150531_210000 20150601_030000 t        NA       t       NA
ALL_TYPES NA      NA          0           NA          NA         NA
0.05  CNT       227599  256.16878    256.03818    256.29938         NA
NA 31.78918   31.6971    31.8818            NA         NA 256.17193
256.04139    256.30246         NA       NA 31.77428   31.68224
31.86685           NA         NA  0.99821     0.99819     0.99822
NA          NA      NA      NA     0          0          0
-0.0031443     -0.010958      0.0046697      NA     NA  1.90201
1.8965      1.90755           NA         NA      0.99999        NA
NA   0.9121          NA      NA      3.61763         NA      NA
3.61762           NA        NA   1.90201         NA       NA
-1.34499       NA      NA   -0.52925         NA      NA    0.03319
NA      NA    0.52779         NA      NA    1.15161         NA      NA
1.05704          NA       NA  0.5248          NA      NA        NA
NA            NA            NA            NA      9.8866e-06      NA
NA           0.99642        NA        NA
V5.1    GSI   000000    20150601_000000 20150601_000000 000000
20150601_000000 20150601_000000 tcp      NA       tcp     NA
ALL_TYPES NA      NA          0           NA          NA         NA
0.05  CNT            2    0            0            0               NA
NA  0          0          0                 NA         NA 983.18842
682.82861   1283.54823         NA       NA 33.43035   14.91493
1066.76792           NA         NA NA          NA          NA
NA          NA      NA      NA     0          0          0 -983.18842
-1283.54823    -682.82861        NA     NA 33.43035    14.91493
1066.76792           NA         NA      0              NA        NA
983.18842         NA      NA 967218.26351         NA      NA
558.79428           NA        NA 983.47255         NA       NA
-1002.09948       NA      NA -995.00784         NA      NA -983.18842
NA      NA -971.36901         NA      NA -964.27736         NA      NA
23.63883          NA       NA 23.63883         NA      NA         1
NA            NA            NA            NA 966659.46922         NA
NA        -864.45111        NA        NA
V5.1    GSI   000000    20150601_000000 20150601_000000 000000
20150531_210000 20150601_030000 u        NA       u       NA
ALL_TYPES NA      NA          0           NA          NA         NA
0.05  CNT       444766    6.47657      6.43919      6.51395         NA
NA 12.71867   12.69229   12.74515           NA         NA   6.22615
6.18863      6.26366         NA       NA 12.76489   12.73842
12.79147           NA         NA  0.92699     0.92657     0.9274
NA          NA      NA      NA     0          0          0    0.25042
0.23611       0.26473        NA     NA  4.8693      4.8592
4.87944           NA         NA      1.04022        NA        NA
2.39841         NA      NA     23.77275         NA      NA   23.71003
NA        NA   4.87573         NA       NA    -3.00654       NA
NA   -1.33594         NA      NA    0.067305        NA      NA
1.5193          NA      NA    3.35059         NA      NA  2.85524
NA       NA  1.42716         NA      NA        NA            NA
NA            NA            NA      0.062712        NA      NA
0.8541         NA        NA
V5.1    GSI   000000    20150601_000000 20150601_000000 000000
20150531_210000 20150601_030000 v        NA       v       NA
ALL_TYPES NA      NA          0           NA          NA         NA
0.05  CNT       444766    0.51622      0.48767      0.54476         NA
NA  9.71306    9.69292    9.73329           NA         NA   0.36571
0.33634      0.39508         NA       NA  9.99332    9.97259
10.01413           NA         NA  0.91394     0.91345     0.91442
NA          NA      NA      NA     0          0          0    0.15051
0.13847       0.16255        NA     NA  4.09709     4.0886
4.10562           NA         NA      1.41155        NA        NA
2.26999         NA      NA     16.80878         NA      NA   16.78613
NA        NA   4.09985         NA       NA    -3.03755       NA
NA   -1.37359         NA      NA    0.05279         NA      NA
1.5021          NA      NA    3.27482         NA      NA  2.87569
NA       NA  1.4373          NA      NA        NA            NA
NA            NA            NA      0.022652        NA      NA
0.83169        NA        NA

------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: j singh
Time: Tue Jan 19 11:48:55 2016

John
I have attached a file for "u" variable .cnt file . In that file the
second column includes the total number of accepted variables since
ANLY_USE = 1. is there any possibility to even include total number of
"u" variables (ie ANLY_USE = +1 & -1 ).
Though I am able to get the total value from log file, still if there
is any possibility to get that information on the file, it would be
welcomed. I have few other questions too but i will let you know about
them later.

P.S. with every mail of yours, you are easing my work and help me
focus more on the science. I wonder what else tricks do you have up
your sleeves.
 

thanks
jagdeep

  


    On Tuesday, 19 January 2016 11:26 PM, John Halley Gotway via RT
<met_help at ucar.edu> wrote:


 Jagdeep,

By default, STAT-Analysis has two options enabled which slow it down a
lot.  Disabling these two options on my machine make the job you sent
me
run in less than 1 minute:

(1) The computation of rank correlation statistics, Spearmans Rank
Correlation and Kendall's Tau.  Disable them using "-rank_corr_flag
FALSE".
(2) The computation of bootstrap confidence intervals.  Disable them
using
"-n_boot_rep 0".

I have two other suggestions...

(1) Instead of using "-fcst_var u", try using "-by fcst_var".  That'll
compute statistics separately for each unique entry found in the
FCST_VAR
column.
(2) Instead of using "-out" to write the output to a text file, try
using
"-out_stat" which will write a full STAT output file, including all
the
header columns.  When doing so, you'll get a long list of values in
the
OBTYPE column.  To avoid that long, OBTYPE column value, you can
manually
set the output using "-set_hdr OBTYPE ALL_TYPES".  Or set it's value
to
whatever you'd like.

met-5.1/bin/stat_analysis \
  -lookin diag_conv_anl.2015060100.stat \
  -job aggregate_stat -line_type MPR -out_line_type CNT -by FCST_VAR \
  -out_stat diag_conv_anl.2015060100_cnt.txt -set_hdr OBTYPE ALL_TYPES
\
  -n_boot_rep 0 -rank_corr_flag FALSE -v 4

Adding the "-by FCST_VAR" option to compute stats for all variables
made
the job run in about 2 1/2 minutes on my machine.  I've attached the
resulting output file.

You also asked why you got fewer matched pairs for u than you
expected.
Note that my counts are slightly higher because my output includes
some
duplicates.  In my data, there are 447888 MPR lines for the variable
"u"
but we end up with only 444766 matched pairs.  Looking at the FCST and
OBS
columns in the MPR output lines for u, I found 3122 of them where the
observation value is actually NA.  Those pairs will be skipped when
computing statistics.

Hope that helps.

Thanks,
John


On Fri, Jan 15, 2016 at 9:49 AM, j singh via RT <met_help at ucar.edu>
wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
>  [rt.rap.ucar.edu #74724]
> hi john
>
> your solution was good enough to shorten the output of gsid2mpr to
> incredible 15 min. it was a relief.
> there's another thing i need to ask
>
> i am using the stat_analysis tools to get cnt out_line_type, its
taking
> around 1 1/2 hr, can you once look at it .you can use the file which
is the
> output of gsid2mpr; the file which i have sent you yesterday
(diag_conv_anl.
> 2015060100), and use the following command
> stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
aggregate_stat
> -line_type MPR -out_line_type CNT -fcst_var u -out diag_conv_anl.
> 2015060100.cnt  -v 2
> this is the log made during the run of above stat_analysis command
>
##############################################################################DEBUG
> 1: Creating STAT-Analysis output file "diag_conv_anl.2015060100.cnt"
> DEBUG 2: STAT Lines read    = 1434170
> DEBUG 2: STAT Lines retained = 447165
> DEBUG 2:
> DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u
-line_type MPR
> -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop
> 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
> GSL_RNG_TYPE=mt19937
> GSL_RNG_SEED=145791062
> DEBUG 2: Computing output for 1 case(s).
> DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
>
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
> DEBUG 2: Job 1 used 444043 out of 447165 STAT
>
lines.##############################################################################please
> enlighten me about the line as to why only 444043 were used insted
of the
> whole 447165.
> i appreciate your help for the gsid2mpr issue and do update me if
there is
> any other fix regarding gsid2mpr.
> thanks,jagdeep
>
>
>
>    On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT <
> met_help at ucar.edu> wrote:
>
>
>  jagdeep,
>
> Thanks for sending that sample data file.  That 88 Mb file is
certainly
> much larger than the 400 Kb test data files we used during
development!
>
> After running that file on my machine for 10 minutes or so, I gave
up and
> killed it.
>
> I did some testing and found 2 things that are slowing it down a
lot...
>  (1) Resizing the output object for each record it reads.
>  (2) Checking for duplicates records.
>
> Below I've listed the run times when adding logic to the code to fix
these
> issues:
>  - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
>  - Fix (2): 6 minutes, 36 seconds
>  - Both fixes (1) and (2): 1 minute, 45 seconds
>
> I'd still like to keep checking for duplicates as the default
behavior, but
> we could add a command line option to disable it.
>
> Do those changes sound reasonable to you?
>
> I won't be able to work on a patch for this until next week.  In the
> meantime, there's an easy hack you could do to skip over the
checking of
> duplicates.
>
> Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc.  After
line
> 464 of that file, add "return(false);", as shown below.  That will
disable
> the checking of duplicates.  Then recompile MET.
>
>    462
>
////////////////////////////////////////////////////////////////////////
>    463
>    464 bool is_dup(const char *key) {
>            return(false);
>    465    bool dup;
>    466
>
> Thanks,
> John Halley Gotway
> met_help at ucar.edu
>
>
> On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT <met_help at ucar.edu>
> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
> >
> > the file is GFS MODEL conventional file. size around 84 GB. the
prob is
> > its taking around 10 hr to give an output. since once the virtual
memory
> > size becomes large the process becomes slow. though I have turned
the
> > verbosity to 0 still, its showing the log on the screen like
SKIPPING
> > DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> > ascertain the log, but didn't want to switch it off. the code is
written
> > by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
CONVSTAT
> > file and run gsid2mpr
> > regards jagdeep
> > Sent from Yahoo Mail on Android
> >
> >  On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> > met_help at ucar.edu> wrote:  Harvir,
> >
> > I see that you're having a tough time running the gsi2mpr tool. 
If not
> > using "-swap" leads to that error, then you should continue using
the
> > "-swap" option.
> >
> > Can you tell me how long it takes for the following command to
run?
> >
> >  time gsid2mpr diag_conv_ges.2015060100  -swap  -outdir out -v 3
> >
> > If you'd like, you could send me that sample file (diag_conv_ges.
> > 2015060100),
> > and I could try running it here to see how long it takes.
> >
> > Follow these instructions to send us data:
>http://www.dtcenter.org/met/users/support/met_help.php#ftp
> >
> > Thanks,
> > John Halley Gotway
> > met_help at ucar.edu
> >
> >
> >
> >
>
>
>
>
>




------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Tue Jan 19 13:56:49 2016

Jagdeep,

I see that you'd like to differentiate between the -1 and 1 values in
the
ANLY_USE column.  My only suggestion would be to add that to the "-by"
option.

For example, running with "-fcst_var u -by ANLY_USE", I get the
attached
output file.  Looking in there, I see that 163,817 pairs have
ANLY_USE=-1
while 280,949 pairs have ANLY_USE=1.

FYI, you can combine multiple "-by" options together, like this: -by
FCST_VAR,ANLY_USE

Hope that helps.

John


On Tue, Jan 19, 2016 at 11:48 AM, j singh via RT <met_help at ucar.edu>
wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>
> John
> I have attached a file for "u" variable .cnt file . In that file the
> second column includes the total number of accepted variables since
> ANLY_USE = 1. is there any possibility to even include total number
of "u"
> variables (ie ANLY_USE = +1 & -1 ).
> Though I am able to get the total value from log file, still if
there is
> any possibility to get that information on the file, it would be
welcomed.
> I have few other questions too but i will let you know about them
later.
>
> P.S. with every mail of yours, you are easing my work and help me
focus
> more on the science. I wonder what else tricks do you have up your
sleeves.
>
>
> thanks
> jagdeep
>
>
>
>
>     On Tuesday, 19 January 2016 11:26 PM, John Halley Gotway via RT
<
> met_help at ucar.edu> wrote:
>
>
>  Jagdeep,
>
> By default, STAT-Analysis has two options enabled which slow it down
a
> lot.  Disabling these two options on my machine make the job you
sent me
> run in less than 1 minute:
>
> (1) The computation of rank correlation statistics, Spearmans Rank
> Correlation and Kendall's Tau.  Disable them using "-rank_corr_flag
FALSE".
> (2) The computation of bootstrap confidence intervals.  Disable them
using
> "-n_boot_rep 0".
>
> I have two other suggestions...
>
> (1) Instead of using "-fcst_var u", try using "-by fcst_var".
That'll
> compute statistics separately for each unique entry found in the
FCST_VAR
> column.
> (2) Instead of using "-out" to write the output to a text file, try
using
> "-out_stat" which will write a full STAT output file, including all
the
> header columns.  When doing so, you'll get a long list of values in
the
> OBTYPE column.  To avoid that long, OBTYPE column value, you can
manually
> set the output using "-set_hdr OBTYPE ALL_TYPES".  Or set it's value
to
> whatever you'd like.
>
> met-5.1/bin/stat_analysis \
>   -lookin diag_conv_anl.2015060100.stat \
>   -job aggregate_stat -line_type MPR -out_line_type CNT -by FCST_VAR
\
>   -out_stat diag_conv_anl.2015060100_cnt.txt -set_hdr OBTYPE
ALL_TYPES \
>   -n_boot_rep 0 -rank_corr_flag FALSE -v 4
>
> Adding the "-by FCST_VAR" option to compute stats for all variables
made
> the job run in about 2 1/2 minutes on my machine.  I've attached the
> resulting output file.
>
> You also asked why you got fewer matched pairs for u than you
expected.
> Note that my counts are slightly higher because my output includes
some
> duplicates.  In my data, there are 447888 MPR lines for the variable
"u"
> but we end up with only 444766 matched pairs.  Looking at the FCST
and OBS
> columns in the MPR output lines for u, I found 3122 of them where
the
> observation value is actually NA.  Those pairs will be skipped when
> computing statistics.
>
> Hope that helps.
>
> Thanks,
> John
>
>
> On Fri, Jan 15, 2016 at 9:49 AM, j singh via RT <met_help at ucar.edu>
wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
> >
> >  [rt.rap.ucar.edu #74724]
> > hi john
> >
> > your solution was good enough to shorten the output of gsid2mpr to
> > incredible 15 min. it was a relief.
> > there's another thing i need to ask
> >
> > i am using the stat_analysis tools to get cnt out_line_type, its
taking
> > around 1 1/2 hr, can you once look at it .you can use the file
which is
> the
> > output of gsid2mpr; the file which i have sent you yesterday
> (diag_conv_anl.
> > 2015060100), and use the following command
> > stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
> aggregate_stat
> > -line_type MPR -out_line_type CNT -fcst_var u -out diag_conv_anl.
> > 2015060100.cnt  -v 2
> > this is the log made during the run of above stat_analysis command
> >
>
##############################################################################DEBUG
> > 1: Creating STAT-Analysis output file
"diag_conv_anl.2015060100.cnt"
> > DEBUG 2: STAT Lines read    = 1434170
> > DEBUG 2: STAT Lines retained = 447165
> > DEBUG 2:
> > DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u
-line_type MPR
> > -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop
> > 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
> > GSL_RNG_TYPE=mt19937
> > GSL_RNG_SEED=145791062
> > DEBUG 2: Computing output for 1 case(s).
> > DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
> >
>
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
> > DEBUG 2: Job 1 used 444043 out of 447165 STAT
> >
>
lines.##############################################################################please
> > enlighten me about the line as to why only 444043 were used insted
of the
> > whole 447165.
> > i appreciate your help for the gsid2mpr issue and do update me if
there
> is
> > any other fix regarding gsid2mpr.
> > thanks,jagdeep
> >
> >
> >
> >    On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT <
> > met_help at ucar.edu> wrote:
> >
> >
> >  jagdeep,
> >
> > Thanks for sending that sample data file.  That 88 Mb file is
certainly
> > much larger than the 400 Kb test data files we used during
development!
> >
> > After running that file on my machine for 10 minutes or so, I gave
up and
> > killed it.
> >
> > I did some testing and found 2 things that are slowing it down a
lot...
> >  (1) Resizing the output object for each record it reads.
> >  (2) Checking for duplicates records.
> >
> > Below I've listed the run times when adding logic to the code to
fix
> these
> > issues:
> >  - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
> >  - Fix (2): 6 minutes, 36 seconds
> >  - Both fixes (1) and (2): 1 minute, 45 seconds
> >
> > I'd still like to keep checking for duplicates as the default
behavior,
> but
> > we could add a command line option to disable it.
> >
> > Do those changes sound reasonable to you?
> >
> > I won't be able to work on a patch for this until next week.  In
the
> > meantime, there's an easy hack you could do to skip over the
checking of
> > duplicates.
> >
> > Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc.
After line
> > 464 of that file, add "return(false);", as shown below.  That will
> disable
> > the checking of duplicates.  Then recompile MET.
> >
> >    462
> >
////////////////////////////////////////////////////////////////////////
> >    463
> >    464 bool is_dup(const char *key) {
> >            return(false);
> >    465    bool dup;
> >    466
> >
> > Thanks,
> > John Halley Gotway
> > met_help at ucar.edu
> >
> >
> > On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT
<met_help at ucar.edu>
> > wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
> > >
> > > the file is GFS MODEL conventional file. size around 84 GB. the
prob is
> > > its taking around 10 hr to give an output. since once the
virtual
> memory
> > > size becomes large the process becomes slow. though I have
turned the
> > > verbosity to 0 still, its showing the log on the screen like
SKIPPING
> > > DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too to
> > > ascertain the log, but didn't want to switch it off. the code is
> written
> > > by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
> CONVSTAT
> > > file and run gsid2mpr
> > > regards jagdeep
> > > Sent from Yahoo Mail on Android
> > >
> > >  On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
> > > met_help at ucar.edu> wrote:  Harvir,
> > >
> > > I see that you're having a tough time running the gsi2mpr tool.
If not
> > > using "-swap" leads to that error, then you should continue
using the
> > > "-swap" option.
> > >
> > > Can you tell me how long it takes for the following command to
run?
> > >
> > >  time gsid2mpr diag_conv_ges.2015060100  -swap  -outdir out -v 3
> > >
> > > If you'd like, you could send me that sample file
(diag_conv_ges.
> > > 2015060100),
> > > and I could try running it here to see how long it takes.
> > >
> > > Follow these instructions to send us data:
> > >  http://www.dtcenter.org/met/users/support/met_help.php#ftp
> > >
> > > Thanks,
> > > John Halley Gotway
> > > met_help at ucar.edu
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
>
>
>

------------------------------------------------
Subject: gsid2mpr (gsi tool) problem
From: John Halley Gotway
Time: Wed Jan 20 13:07:27 2016

Jagdeep,

I wanted to let you know that we talked about these enhancements to
the MET
GSI diagnostic tools internally.  Unfortunately, our funding is very
limited on MET until at least March 1st.  So I won't be able to
provide you
with an updated version until at least March.

I can tell you that we'll definitely include these enhancement in the
next
release of MET.

So for now, I'll resolve this ticket.  Unless you have additional
questions.

Thanks,
John

On Tue, Jan 19, 2016 at 1:56 PM, John Halley Gotway <johnhg at ucar.edu>
wrote:

> Jagdeep,
>
> I see that you'd like to differentiate between the -1 and 1 values
in the
> ANLY_USE column.  My only suggestion would be to add that to the "-
by"
> option.
>
> For example, running with "-fcst_var u -by ANLY_USE", I get the
attached
> output file.  Looking in there, I see that 163,817 pairs have
ANLY_USE=-1
> while 280,949 pairs have ANLY_USE=1.
>
> FYI, you can combine multiple "-by" options together, like this: -by
> FCST_VAR,ANLY_USE
>
> Hope that helps.
>
> John
>
>
> On Tue, Jan 19, 2016 at 11:48 AM, j singh via RT <met_help at ucar.edu>
> wrote:
>
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>>
>> John
>> I have attached a file for "u" variable .cnt file . In that file
the
>> second column includes the total number of accepted variables since
>> ANLY_USE = 1. is there any possibility to even include total number
of "u"
>> variables (ie ANLY_USE = +1 & -1 ).
>> Though I am able to get the total value from log file, still if
there is
>> any possibility to get that information on the file, it would be
welcomed.
>> I have few other questions too but i will let you know about them
later.
>>
>> P.S. with every mail of yours, you are easing my work and help me
focus
>> more on the science. I wonder what else tricks do you have up your
sleeves.
>>
>>
>> thanks
>> jagdeep
>>
>>
>>
>>
>>     On Tuesday, 19 January 2016 11:26 PM, John Halley Gotway via RT
<
>> met_help at ucar.edu> wrote:
>>
>>
>>  Jagdeep,
>>
>> By default, STAT-Analysis has two options enabled which slow it
down a
>> lot.  Disabling these two options on my machine make the job you
sent me
>> run in less than 1 minute:
>>
>> (1) The computation of rank correlation statistics, Spearmans Rank
>> Correlation and Kendall's Tau.  Disable them using "-rank_corr_flag
>> FALSE".
>> (2) The computation of bootstrap confidence intervals.  Disable
them using
>> "-n_boot_rep 0".
>>
>> I have two other suggestions...
>>
>> (1) Instead of using "-fcst_var u", try using "-by fcst_var".
That'll
>> compute statistics separately for each unique entry found in the
FCST_VAR
>> column.
>> (2) Instead of using "-out" to write the output to a text file, try
using
>> "-out_stat" which will write a full STAT output file, including all
the
>> header columns.  When doing so, you'll get a long list of values in
the
>> OBTYPE column.  To avoid that long, OBTYPE column value, you can
manually
>> set the output using "-set_hdr OBTYPE ALL_TYPES".  Or set it's
value to
>> whatever you'd like.
>>
>> met-5.1/bin/stat_analysis \
>>   -lookin diag_conv_anl.2015060100.stat \
>>   -job aggregate_stat -line_type MPR -out_line_type CNT -by
FCST_VAR \
>>   -out_stat diag_conv_anl.2015060100_cnt.txt -set_hdr OBTYPE
ALL_TYPES \
>>   -n_boot_rep 0 -rank_corr_flag FALSE -v 4
>>
>> Adding the "-by FCST_VAR" option to compute stats for all variables
made
>> the job run in about 2 1/2 minutes on my machine.  I've attached
the
>> resulting output file.
>>
>> You also asked why you got fewer matched pairs for u than you
expected.
>> Note that my counts are slightly higher because my output includes
some
>> duplicates.  In my data, there are 447888 MPR lines for the
variable "u"
>> but we end up with only 444766 matched pairs.  Looking at the FCST
and OBS
>> columns in the MPR output lines for u, I found 3122 of them where
the
>> observation value is actually NA.  Those pairs will be skipped when
>> computing statistics.
>>
>> Hope that helps.
>>
>> Thanks,
>> John
>>
>>
>> On Fri, Jan 15, 2016 at 9:49 AM, j singh via RT <met_help at ucar.edu>
>> wrote:
>>
>> >
>> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>> >
>> >  [rt.rap.ucar.edu #74724]
>> > hi john
>> >
>> > your solution was good enough to shorten the output of gsid2mpr
to
>> > incredible 15 min. it was a relief.
>> > there's another thing i need to ask
>> >
>> > i am using the stat_analysis tools to get cnt out_line_type, its
taking
>> > around 1 1/2 hr, can you once look at it .you can use the file
which is
>> the
>> > output of gsid2mpr; the file which i have sent you yesterday
>> (diag_conv_anl.
>> > 2015060100), and use the following command
>> > stat_analysis -lookinin diag_conv_anl.2015060100.stat -job
>> aggregate_stat
>> > -line_type MPR -out_line_type CNT -fcst_var u -out diag_conv_anl.
>> > 2015060100.cnt  -v 2
>> > this is the log made during the run of above stat_analysis
command
>> >
>>
##############################################################################DEBUG
>> > 1: Creating STAT-Analysis output file
"diag_conv_anl.2015060100.cnt"
>> > DEBUG 2: STAT Lines read    = 1434170
>> > DEBUG 2: STAT Lines retained = 447165
>> > DEBUG 2:
>> > DEBUG 2: Processing Job 1: -job aggregate_stat -fcst_var u
-line_type
>> MPR
>> > -out_line_type CNT -out_alpha 0.05000 -boot_interval 1
-boot_rep_prop
>> > 1.00000mt19937 -boot_seed '' -rank_corr_flag 1
>> > GSL_RNG_TYPE=mt19937
>> > GSL_RNG_SEED=145791062
>> > DEBUG 2: Computing output for 1 case(s).
>> > DEBUG 2: For case "(nul)", found 24 unique OBTYPE values:
>> >
>>
220,281,244,254,290,253,280,233,221,287,284,243,231,252,250,242,229,230,251,245,247,246,223,2
>> > DEBUG 2: Job 1 used 444043 out of 447165 STAT
>> >
>>
lines.##############################################################################please
>> > enlighten me about the line as to why only 444043 were used
insted of
>> the
>> > whole 447165.
>> > i appreciate your help for the gsid2mpr issue and do update me if
there
>> is
>> > any other fix regarding gsid2mpr.
>> > thanks,jagdeep
>> >
>> >
>> >
>> >    On Friday, 15 January 2016 3:33 AM, John Halley Gotway via RT
<
>> > met_help at ucar.edu> wrote:
>> >
>> >
>> >  jagdeep,
>> >
>> > Thanks for sending that sample data file.  That 88 Mb file is
certainly
>> > much larger than the 400 Kb test data files we used during
development!
>> >
>> > After running that file on my machine for 10 minutes or so, I
gave up
>> and
>> > killed it.
>> >
>> > I did some testing and found 2 things that are slowing it down a
lot...
>> >  (1) Resizing the output object for each record it reads.
>> >  (2) Checking for duplicates records.
>> >
>> > Below I've listed the run times when adding logic to the code to
fix
>> these
>> > issues:
>> >  - Fix (1): At least 40 minutes (wasn't patient enough to let it
finish)
>> >  - Fix (2): 6 minutes, 36 seconds
>> >  - Both fixes (1) and (2): 1 minute, 45 seconds
>> >
>> > I'd still like to keep checking for duplicates as the default
behavior,
>> but
>> > we could add a command line option to disable it.
>> >
>> > Do those changes sound reasonable to you?
>> >
>> > I won't be able to work on a patch for this until next week.  In
the
>> > meantime, there's an easy hack you could do to skip over the
checking of
>> > duplicates.
>> >
>> > Edit the file met-5.1/src/tools/other/gsi_tools/gsid2mpr.cc.
After line
>> > 464 of that file, add "return(false);", as shown below.  That
will
>> disable
>> > the checking of duplicates.  Then recompile MET.
>> >
>> >    462
>> >
////////////////////////////////////////////////////////////////////////
>> >    463
>> >    464 bool is_dup(const char *key) {
>> >            return(false);
>> >    465    bool dup;
>> >    466
>> >
>> > Thanks,
>> > John Halley Gotway
>> > met_help at ucar.edu
>> >
>> >
>> > On Thu, Jan 14, 2016 at 12:06 PM, j singh via RT
<met_help at ucar.edu>
>> > wrote:
>> >
>> > >
>> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=74724 >
>> > >
>> > > the file is GFS MODEL conventional file. size around 84 GB. the
prob
>> is
>> > > its taking around 10 hr to give an output. since once the
virtual
>> memory
>> > > size becomes large the process becomes slow. though I have
turned the
>> > > verbosity to 0 still, its showing the log on the screen like
SKIPPING
>> > > DUPLICATE VALUE. I have gone through the gsid2mpr.cc file too
to
>> > > ascertain the log, but didn't want to switch it off. the code
is
>> written
>> > > by BULLOCK of UCAR. I would like you to use GSI TOOL OUTOUT say
>> CONVSTAT
>> > > file and run gsid2mpr
>> > > regards jagdeep
>> > > Sent from Yahoo Mail on Android
>> > >
>> > >  On Thu, 14 Jan, 2016 at 10:25 pm, John Halley Gotway via RT<
>> > > met_help at ucar.edu> wrote:  Harvir,
>> > >
>> > > I see that you're having a tough time running the gsi2mpr tool.
If
>> not
>> > > using "-swap" leads to that error, then you should continue
using the
>> > > "-swap" option.
>> > >
>> > > Can you tell me how long it takes for the following command to
run?
>> > >
>> > >  time gsid2mpr diag_conv_ges.2015060100  -swap  -outdir out -v
3
>> > >
>> > > If you'd like, you could send me that sample file
(diag_conv_ges.
>> > > 2015060100),
>> > > and I could try running it here to see how long it takes.
>> > >
>> > > Follow these instructions to send us data:
>> > >  http://www.dtcenter.org/met/users/support/met_help.php#ftp
>> > >
>> > > Thanks,
>> > > John Halley Gotway
>> > > met_help at ucar.edu
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>

------------------------------------------------


More information about the Met_help mailing list