[Met_help] [rt.rap.ucar.edu #44348] History for NA values for PR_CORR

RAL HelpDesk {for John Halley Gotway} met_help at ucar.edu
Fri Feb 18 10:07:29 MST 2011


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Hello,

I have attached a point_stat file that shows many examples of NAs 
showing up for PR_CORR for the field SPFH. I wonder if you can explain 
why there are so many NA values but only for SPFH even when the obs 
count is much greater than 1. Could it perhaps be related to the very 
small values of SPFH in the chosen units?

Thanks. Any help in solving this puzzle would be most appreciated.

Sincerely,

John Henderson



----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: NA values for PR_CORR
From: John Halley Gotway
Time: Mon Feb 14 09:24:47 2011

John,

I took a look at the data you sent and am not surprised by the
behavior you're seeing.  When statistics are computed over a very
small number of matched pairs (TOTAL column in your output), many
statistics end up being undefined (NA).  I ran the following command
line to extract out just the "TOTAL" and "PR_CORR" columns from the
file you sent me:
   cat point_stat_MULTIHOUR_960000L_20070718_120000V_cnt.txt | sed -r
's/ +/ /g' | cut -d' ' -f22,43

The output of that command is attached.  Take a look and you'll notice
that PR_CORR is NA only when the number of points is small (TOTAL <
5).  If you're interested in evaluating the performance at
these various pressure levels, I'd suggest running the evaluation for
a large number of forecasts and then use the STAT-Analysis tool to
aggregate the results through time.

For example, consider the following STAT-Analysis job:
   stat_analysis -lookin out/point_stat -job aggregate_stat -line_type
SL1L2 -out_line_type CNT -fcst_var RH -fcst_lev P300 -vx_mask AL
-fcst_lead 96 -dump_row dump.stat

This job will look through all the files ending in ".stat" found in
"out/point_stat".  It'll filter the lines and keep only the SL1L2
lines that are for the RH at P300 over the "AL" masking area with
a lead time of 96 hours.  It'll sum those SL1L2 lines and compute
continuous statistics for them (including PR_CORR).  It will also
write out the matching stat lines to the file "dump.stat".  It's
always a good idea to do this to make sure that your analysis job is
operating over the subset of data that you intended.

Generally speaking, when the "TOTAL" is too small to compute
statistics, you need to perform some sort of aggregation to make the
TOTAL value bigger.

Hope that helps.

John Halley Gotway
met_help at ucar.edu

On 02/11/2011 05:20 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
>
> Fri Feb 11 17:20:36 2011: Request 44348 was acted upon.
> Transaction: Ticket created by jhenders at aer.com
>        Queue: met_help
>      Subject: NA values for PR_CORR
>        Owner: Nobody
>   Requestors: jhenders at aer.com
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348 >
>
>
> Hello,
>
> I have attached a point_stat file that shows many examples of NAs
> showing up for PR_CORR for the field SPFH. I wonder if you can
explain
> why there are so many NA values but only for SPFH even when the obs
> count is much greater than 1. Could it perhaps be related to the
very
> small values of SPFH in the chosen units?
>
> Thanks. Any help in solving this puzzle would be most appreciated.
>
> Sincerely,
>
> John Henderson
>
>

------------------------------------------------
Subject: NA values for PR_CORR
From: John Halley Gotway
Time: Mon Feb 14 09:24:47 2011

TOTAL PR_CORR
50 0.95300
7 0.82767
7 0.83370
12 0.63278
12 0.84899
5 0.98043
2 1.00000
4 0.82431
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.59424
2 1.00000
1 NA
1 NA
1 NA
50 0.99634
7 0.99748
7 0.99027
12 0.98123
12 0.92935
5 0.99584
2 NA
4 0.96861
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.98756
2 1.00000
1 NA
1 NA
1 NA
50 0.98518
7 0.98096
7 0.98576
12 0.97220
12 0.76223
5 0.90461
2 1.00000
4 0.45839
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.47335
2 1.00000
1 NA
1 NA
1 NA
51 0.99636
7 0.98508
8 0.99797
12 0.99056
12 0.93738
5 0.99626
2 1.00000
4 0.97142
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.98211
2 1.00000
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
1 NA
51 0.99453
7 0.99673
8 0.99788
12 0.98306
12 0.95240
5 0.98755
2 NA
4 0.93187
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.98406
2 1.00000
1 NA
1 NA
1 NA
50 0.99508
7 0.95527
7 0.99841
12 0.92291
12 0.97135
5 0.99846
2 1.00000
4 0.57225
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.98523
2 1.00000
1 NA
1 NA
1 NA
50 0.99307
7 0.92819
7 0.99969
12 0.90575
12 0.98178
5 0.99871
2 1.00000
4 0.52745
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.96613
2 1.00000
1 NA
1 NA
1 NA
50 0.99375
7 0.99265
7 0.99945
12 0.95675
12 0.97695
5 0.99207
2 NA
4 0.24770
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.99211
2 1.00000
1 NA
1 NA
1 NA
51 0.99031
7 0.98999
8 0.99114
12 0.97180
12 0.98679
5 0.98333
2 1.00000
4 0.43033
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.85277
2 1.00000
1 NA
1 NA
1 NA
4 NA
2 NA
2 NA
2 NA
2 NA
51 0.98585
7 0.98071
8 0.98943
12 0.97517
12 0.98057
5 0.95446
2 NA
4 0.16070
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.75401
2 1.00000
1 NA
1 NA
1 NA
1 NA
1 NA
44 0.61645
6 0.67483
6 0.90904
11 0.65841
9 0.15313
5 0.52997
2 1.00000
3 -0.11660
1 NA
1 NA
1 NA
1 NA
1 NA
1 NA
3 0.88771
2 -1.00000
1 NA
1 NA
1 NA
50 0.82971
7 0.95306
7 0.78326
12 0.86455
12 0.72208
5 0.43658
2 1.00000
4 0.84853
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 -0.79208
2 -1.00000
1 NA
1 NA
1 NA
51 0.71341
7 0.72240
8 0.91126
12 0.52624
12 -0.14005
5 0.30782
2 -1.00000
4 -0.75378
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.63950
2 1.00000
1 NA
1 NA
1 NA
51 0.52982
7 0.82165
8 0.39556
12 0.47046
12 0.55280
5 0.38462
2 1.00000
4 -0.81939
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.58914
2 1.00000
1 NA
1 NA
1 NA
44 NA
6 NA
6 NA
11 NA
9 NA
5 NA
2 NA
3 NA
1 NA
1 NA
1 NA
1 NA
1 NA
1 NA
3 NA
2 NA
1 NA
1 NA
1 NA
50 NA
7 NA
7 NA
12 NA
12 NA
5 NA
2 NA
4 NA
1 NA
1 NA
1 NA
1 NA
2 NA
1 NA
1 NA
3 NA
2 NA
1 NA
1 NA
1 NA
51 NA
7 NA
8 NA
12 NA
12 NA
5 NA
2 NA
4 NA
1 NA
1 NA
1 NA
1 NA
2 NA
1 NA
1 NA
1 NA
3 NA
2 NA
1 NA
1 NA
1 NA
51 0.41529
7 NA
8 NA
12 NA
12 NA
5 NA
2 NA
4 NA
1 NA
1 NA
1 NA
1 NA
2 NA
1 NA
1 NA
1 NA
3 NA
2 NA
1 NA
1 NA
1 NA
44 0.69892
6 0.88129
6 0.92216
11 0.69247
9 0.09791
5 0.65874
2 1.00000
3 -0.32107
1 NA
1 NA
1 NA
1 NA
1 NA
1 NA
3 0.94823
2 -1.00000
1 NA
1 NA
1 NA
50 0.84454
7 0.93037
7 0.75283
12 0.84157
12 0.76881
5 0.48148
2 1.00000
4 0.86938
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 -0.89778
2 -1.00000
1 NA
1 NA
1 NA
51 0.59056
7 0.90104
8 0.83549
12 0.24472
12 -0.34841
5 0.14498
2 -1.00000
4 -0.81474
1 NA
1 NA
1 NA
1 NA
2 -1.00000
1 NA
1 NA
1 NA
3 0.20701
2 -1.00000
1 NA
1 NA
1 NA
51 0.34164
7 0.81056
8 -0.06748
12 0.10649
12 0.24550
5 0.71205
2 1.00000
4 -0.77637
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.45516
2 1.00000
1 NA
1 NA
1 NA
50 0.98368
7 0.98650
7 0.97019
12 0.96193
12 0.99336
5 0.97632
2 1.00000
4 0.99940
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.91875
2 1.00000
1 NA
1 NA
1 NA
49 0.98960
6 0.91271
7 0.99815
12 0.94094
12 0.99408
5 0.98248
2 1.00000
4 0.97934
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.98893
2 1.00000
1 NA
1 NA
1 NA
49 0.96842
6 0.95942
7 0.99062
12 0.84701
12 0.99200
5 0.90321
2 1.00000
4 0.99505
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.76295
2 1.00000
1 NA
1 NA
1 NA
47 0.94356
6 0.86135
8 0.97198
11 0.93497
11 0.93564
5 0.83028
2 1.00000
3 0.88133
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.99755
2 1.00000
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
1 NA
46 0.95996
6 0.97396
8 0.98450
10 0.91631
11 0.98045
5 0.99275
2 1.00000
3 0.59563
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.97261
2 1.00000
1 NA
1 NA
1 NA
1 NA
1 NA
49 0.99351
6 0.88512
7 0.98851
12 0.99180
12 0.99664
5 0.99744
2 1.00000
4 0.99862
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.98042
2 1.00000
1 NA
1 NA
1 NA
49 0.98695
6 0.97824
7 0.99287
12 0.98798
12 0.86868
5 0.78614
2 1.00000
4 0.98385
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.98694
2 1.00000
1 NA
1 NA
1 NA
49 0.99089
6 0.94488
7 0.99050
12 0.99286
12 0.99594
5 0.92609
2 1.00000
4 0.99888
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.99876
2 1.00000
1 NA
1 NA
1 NA
49 0.95788
6 0.99467
7 0.98221
12 0.85774
12 0.97815
5 0.62277
2 1.00000
4 0.85932
1 NA
1 NA
1 NA
1 NA
2 -1.00000
1 NA
1 NA
3 0.61583
2 1.00000
1 NA
1 NA
1 NA
49 0.99089
6 0.94488
7 0.99050
12 0.99286
12 0.99594
5 0.92609
2 1.00000
4 0.99888
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
3 0.99876
2 1.00000
1 NA
1 NA
1 NA
49 0.95788
6 0.99467
7 0.98221
12 0.85774
12 0.97815
5 0.62277
2 1.00000
4 0.85932
1 NA
1 NA
1 NA
1 NA
2 -1.00000
1 NA
1 NA
3 0.61583
2 1.00000
1 NA
1 NA
1 NA
47 0.97141
6 0.81829
8 0.95526
11 0.98297
11 0.97241
5 0.75396
2 1.00000
3 0.86658
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.99637
2 -1.00000
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
1 NA
47 0.96767
6 0.96998
8 0.98886
11 0.94730
11 0.93582
5 0.96979
2 1.00000
3 0.81388
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.74510
2 1.00000
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
1 NA
46 0.97000
6 0.99557
8 0.98686
10 0.93997
11 0.97212
5 0.99181
2 1.00000
3 0.99884
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.98493
2 1.00000
1 NA
1 NA
1 NA
1 NA
1 NA
46 0.90884
6 0.97850
8 0.98571
10 0.90185
11 0.94942
5 0.95429
2 1.00000
3 0.99192
1 NA
1 NA
1 NA
1 NA
2 1.00000
1 NA
1 NA
1 NA
3 0.95737
2 1.00000
1 NA
1 NA
1 NA
1 NA
1 NA
2185 0.87388
292 0.87306
389 0.83957
361 0.64262
392 0.79749
359 0.82228
31 0.23239
68 0.60438
46 0.81246
27 0.81577
23 0.67420
75 0.68884
21 0.90534
24 0.36879
48 0.74334
29 0.53183
37 0.50042
2 NA
94 0.58983
27 0.74058
101 0.63800
35 0.73546
26 0.52487
95 0.84052
62 0.65919
67 0.89126
81 0.82247
213 0.95237
15 0.84657
5 0.60811
15 0.57676
18 0.78565
1 NA
11 0.50380
1 NA
1 NA
1 NA
2 NA
5 0.60811
5 0.86155
13 0.54383
1 NA
3 NA
2095 0.87159
286 0.84380
389 0.81642
361 0.76991
391 0.72961
359 0.68851
31 0.39515
68 -0.11091
46 0.61412
27 0.23068
23 0.40677
75 0.78912
21 0.85825
24 0.45951
47 0.58558
29 0.50344
37 0.23150
2 NA
94 0.38587
27 0.33146
101 0.51529
35 0.64213
25 -0.19866
95 0.86663
62 0.45114
67 0.79810
81 0.69369
89 0.83435
7 0.82681
3 0.37899
2 -1.00000
2 -1.00000
3 0.37899
3 NA
2177 0.85638
287 0.83015
389 0.76631
361 0.58810
392 0.61264
359 0.65717
31 NA
68 NA
46 NA
27 NA
23 NA
75 0.67127
21 NA
24 NA
48 NA
29 NA
37 NA
2 NA
94 NA
27 NA
101 NA
35 NA
25 NA
95 0.84777
62 NA
67 NA
81 0.64343
93 0.77814
7 NA
3 NA
2 NA
1 NA
2 NA
3 NA
3 NA
1 NA
2095 0.47302
286 0.45051
389 0.62445
361 0.44380
391 0.28505
359 -0.06986
31 -0.04321
68 0.11336
46 0.25275
27 0.58573
23 -0.23552
75 -0.01037
21 0.63172
24 -0.22718
47 0.45482
29 0.30292
37 -0.10781
2 NA
94 -0.28592
27 0.28165
101 0.74755
35 0.21562
25 0.32332
95 0.04191
62 0.07353
67 0.27066
81 -0.24532
89 0.61768
7 0.98653
3 0.50067
2 -1.00000
2 -1.00000
3 0.50067
3 NA
2147 0.54568
282 0.55082
391 0.46395
356 0.39248
376 0.59019
353 0.33303
31 0.36291
65 0.63530
47 -0.00632
23 0.61645
23 0.29360
67 0.30659
22 0.11078
23 0.36646
48 0.46459
27 0.06617
37 0.16275
2 NA
92 0.21448
25 0.47863
100 0.43070
35 -0.11927
25 0.48485
95 0.27489
62 0.42055
67 0.43757
79 0.42963
206 0.86337
15 0.88959
5 0.43839
15 0.80212
18 -0.31652
1 NA
11 0.44022
1 NA
1 NA
1 NA
2 NA
5 0.43839
5 -0.92685
13 -0.32653
1 NA
3 NA
2147 0.55631
282 0.26748
391 0.70061
356 0.38810
376 0.52986
353 0.37133
31 0.03177
65 0.22747
47 0.59808
23 -0.32703
23 0.62508
67 0.72005
22 0.38100
23 0.73803
48 0.31533
27 0.28616
37 0.74186
2 NA
92 0.30850
25 0.25243
100 0.25548
35 0.51000
25 -0.04241
95 0.17213
62 0.55505
67 -0.00056
79 0.21992
206 0.34638
15 0.05912
5 -0.29835
15 -0.19925
18 0.52345
1 NA
11 -0.46324
1 NA
1 NA
1 NA
2 NA
5 -0.29835
5 0.63936
13 0.42640
1 NA
3 NA
2147 0.51620
282 0.32802
391 0.59004
356 0.61172
376 0.53015
353 0.42087
31 -0.02361
65 0.41271
47 0.52947
23 0.00432
23 0.33624
67 0.55871
22 0.43292
23 0.54844
48 0.38550
27 0.26335
37 0.69982
2 NA
92 0.54676
25 0.34621
100 0.06661
35 0.00092
25 0.49618
95 0.17042
62 -0.10166
67 0.08399
79 0.18838
206 0.70314
15 0.57787
5 0.33729
15 0.31298
18 0.58244
1 NA
11 0.20454
1 NA
1 NA
1 NA
2 NA
5 0.33729
5 0.96566
13 0.58722
1 NA
3 NA

------------------------------------------------
Subject: NA values for PR_CORR
From: jhenders at aer.com
Time: Mon Feb 14 09:31:16 2011

Hello John,

Thanks for your analysis. Overall I agree and understand your
assessment, however, see my attachment that shows that there are some
regions that have sizable values of TOTAL, but for which PR_CORR is
still NA. This is a screen capture of output from your extraction. It
is
these lines that trouble me. It seems (from my original file) that
these
NA values only showed up for SPFH.

Incidentally, why would PR_CORR be NA for ANY regions that have TOTAL
> 1?

Thanks.

John

On 2/14/11 11:24 AM, RAL HelpDesk {for John Halley Gotway} wrote:
> John,
>
> I took a look at the data you sent and am not surprised by the
behavior you're seeing.  When statistics are computed over a very
small number of matched pairs (TOTAL column in your output), many
> statistics end up being undefined (NA).  I ran the following command
line to extract out just the "TOTAL" and "PR_CORR" columns from the
file you sent me:
>     cat point_stat_MULTIHOUR_960000L_20070718_120000V_cnt.txt | sed
-r 's/ +/ /g' | cut -d' ' -f22,43
>
> The output of that command is attached.  Take a look and you'll
notice that PR_CORR is NA only when the number of points is small
(TOTAL<  5).  If you're interested in evaluating the performance at
> these various pressure levels, I'd suggest running the evaluation
for a large number of forecasts and then use the STAT-Analysis tool to
aggregate the results through time.
>
> For example, consider the following STAT-Analysis job:
>     stat_analysis -lookin out/point_stat -job aggregate_stat
-line_type SL1L2 -out_line_type CNT -fcst_var RH -fcst_lev P300
-vx_mask AL -fcst_lead 96 -dump_row dump.stat
>
> This job will look through all the files ending in ".stat" found in
"out/point_stat".  It'll filter the lines and keep only the SL1L2
lines that are for the RH at P300 over the "AL" masking area with
> a lead time of 96 hours.  It'll sum those SL1L2 lines and compute
continuous statistics for them (including PR_CORR).  It will also
write out the matching stat lines to the file "dump.stat".  It's
> always a good idea to do this to make sure that your analysis job is
operating over the subset of data that you intended.
>
> Generally speaking, when the "TOTAL" is too small to compute
statistics, you need to perform some sort of aggregation to make the
TOTAL value bigger.
>
> Hope that helps.
>
> John Halley Gotway
> met_help at ucar.edu
>
> On 02/11/2011 05:20 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
>> Fri Feb 11 17:20:36 2011: Request 44348 was acted upon.
>> Transaction: Ticket created by jhenders at aer.com
>>         Queue: met_help
>>       Subject: NA values for PR_CORR
>>         Owner: Nobody
>>    Requestors: jhenders at aer.com
>>        Status: new
>>   Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348>
>>
>>
>> Hello,
>>
>> I have attached a point_stat file that shows many examples of NAs
>> showing up for PR_CORR for the field SPFH. I wonder if you can
explain
>> why there are so many NA values but only for SPFH even when the obs
>> count is much greater than 1. Could it perhaps be related to the
very
>> small values of SPFH in the chosen units?
>>
>> Thanks. Any help in solving this puzzle would be most appreciated.
>>
>> Sincerely,
>>
>> John Henderson
>>
>>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: John Halley Gotway
Time: Mon Feb 14 10:01:25 2011

John,

PR_CORR is computed on line 1029-1044 of the file
METv3.0/lib/vx_met_util/met_stats.cc.  Here's that section of code:

   // Compute correlation coefficient
   v =  (cnt_info.n*cnt_info.ffbar*cnt_info.n
       - cnt_info.fbar.v*cnt_info.n*cnt_info.fbar.v*cnt_info.n)
        *
        (cnt_info.n*cnt_info.oobar*cnt_info.n
       - cnt_info.obar.v*cnt_info.n*cnt_info.obar.v*cnt_info.n);

   if(v < 0 || is_eq(v, 0.0)) {
      cnt_info.pr_corr.v = bad_data_double;
   }
   else {
      den = sqrt(v);
      cnt_info.pr_corr.v = (  (cnt_info.n*cnt_info.fobar*cnt_info.n)
                            -
(cnt_info.fbar.v*cnt_info.n*cnt_info.obar.v*cnt_info.n))
                           /den;
   }

It's computed as NUM/DEN, and if that DEN is equal to 0 (within
0.00001), then PR_CORR is set to NA.  I'd say the next step would be
to look at the corresponding SL1L2 line for one of the suspect
cases, perhaps the one where TOTAL=50.  From that SL1L2 line, pick out
the values for N (i.e. TOTAL), FBAR, OBAR, FFBAR, and OOBAR.  Then
plug them into a calculator and compute that denominator
listed above (in the line "v = ...").  If you end up with something
within 0.00001 of 0, then that's why you're getting an NA.  If not,
perhaps there's a problem somewhere.

I suspect that you're original inclination was probably correct - the
values for SPFH being very close to zero are the likely culprit.

John

On 02/14/2011 09:31 AM, RAL HelpDesk {for jhenders at aer.com} wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348 >
>
> Hello John,
>
> Thanks for your analysis. Overall I agree and understand your
> assessment, however, see my attachment that shows that there are
some
> regions that have sizable values of TOTAL, but for which PR_CORR is
> still NA. This is a screen capture of output from your extraction.
It is
> these lines that trouble me. It seems (from my original file) that
these
> NA values only showed up for SPFH.
>
> Incidentally, why would PR_CORR be NA for ANY regions that have
TOTAL > 1?
>
> Thanks.
>
> John
>
> On 2/14/11 11:24 AM, RAL HelpDesk {for John Halley Gotway} wrote:
>> John,
>>
>> I took a look at the data you sent and am not surprised by the
behavior you're seeing.  When statistics are computed over a very
small number of matched pairs (TOTAL column in your output), many
>> statistics end up being undefined (NA).  I ran the following
command line to extract out just the "TOTAL" and "PR_CORR" columns
from the file you sent me:
>>     cat point_stat_MULTIHOUR_960000L_20070718_120000V_cnt.txt | sed
-r 's/ +/ /g' | cut -d' ' -f22,43
>>
>> The output of that command is attached.  Take a look and you'll
notice that PR_CORR is NA only when the number of points is small
(TOTAL<  5).  If you're interested in evaluating the performance at
>> these various pressure levels, I'd suggest running the evaluation
for a large number of forecasts and then use the STAT-Analysis tool to
aggregate the results through time.
>>
>> For example, consider the following STAT-Analysis job:
>>     stat_analysis -lookin out/point_stat -job aggregate_stat
-line_type SL1L2 -out_line_type CNT -fcst_var RH -fcst_lev P300
-vx_mask AL -fcst_lead 96 -dump_row dump.stat
>>
>> This job will look through all the files ending in ".stat" found in
"out/point_stat".  It'll filter the lines and keep only the SL1L2
lines that are for the RH at P300 over the "AL" masking area with
>> a lead time of 96 hours.  It'll sum those SL1L2 lines and compute
continuous statistics for them (including PR_CORR).  It will also
write out the matching stat lines to the file "dump.stat".  It's
>> always a good idea to do this to make sure that your analysis job
is operating over the subset of data that you intended.
>>
>> Generally speaking, when the "TOTAL" is too small to compute
statistics, you need to perform some sort of aggregation to make the
TOTAL value bigger.
>>
>> Hope that helps.
>>
>> John Halley Gotway
>> met_help at ucar.edu
>>
>> On 02/11/2011 05:20 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
>>> Fri Feb 11 17:20:36 2011: Request 44348 was acted upon.
>>> Transaction: Ticket created by jhenders at aer.com
>>>         Queue: met_help
>>>       Subject: NA values for PR_CORR
>>>         Owner: Nobody
>>>    Requestors: jhenders at aer.com
>>>        Status: new
>>>   Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348>
>>>
>>>
>>> Hello,
>>>
>>> I have attached a point_stat file that shows many examples of NAs
>>> showing up for PR_CORR for the field SPFH. I wonder if you can
explain
>>> why there are so many NA values but only for SPFH even when the
obs
>>> count is much greater than 1. Could it perhaps be related to the
very
>>> small values of SPFH in the chosen units?
>>>
>>> Thanks. Any help in solving this puzzle would be most appreciated.
>>>
>>> Sincerely,
>>>
>>> John Henderson
>>>
>>>
>

------------------------------------------------
Subject: NA values for PR_CORR
From: John Halley Gotway
Time: Mon Feb 14 10:30:48 2011

John,

So it's probably the case that our checking how to determine what is
zero is too stringent for SPFH, considering anything with 0.00001 of 0
to actually be zero.  Could you try something out for us?

I tightened up the requirements a bit to require values to be within
10E-16 to be considered zero.  Please copy the attached file into
METv3.0/lib/vx_math/is_bad_data.h.  Then recompile MET, being
sure to do a "make clean" first.  Try rerunning your case and let me
know if you end up with values where you had NA's before.

If you'd like to try other values, you can easily see/modify where
I've set the "default_tol = 10E-16" value.

In earlier versions of MET, when computing stats as a ratio of
NUM/DEN, we were only checking if the denominator was strictly equal
to zero.  This led to roundoff problems because we had some
denominators that were really close, but not exactly equal to zero.
Perhaps our choice of a tolerance value of 10E-5 was too generous.

Thanks,
John


On 02/14/2011 10:01 AM, RAL HelpDesk {for John Halley Gotway} wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348 >
>
> John,
>
> PR_CORR is computed on line 1029-1044 of the file
METv3.0/lib/vx_met_util/met_stats.cc.  Here's that section of code:
>
>    // Compute correlation coefficient
>    v =  (cnt_info.n*cnt_info.ffbar*cnt_info.n
>        - cnt_info.fbar.v*cnt_info.n*cnt_info.fbar.v*cnt_info.n)
>         *
>         (cnt_info.n*cnt_info.oobar*cnt_info.n
>        - cnt_info.obar.v*cnt_info.n*cnt_info.obar.v*cnt_info.n);
>
>    if(v < 0 || is_eq(v, 0.0)) {
>       cnt_info.pr_corr.v = bad_data_double;
>    }
>    else {
>       den = sqrt(v);
>       cnt_info.pr_corr.v = (  (cnt_info.n*cnt_info.fobar*cnt_info.n)
>                             -
(cnt_info.fbar.v*cnt_info.n*cnt_info.obar.v*cnt_info.n))
>                            /den;
>    }
>
> It's computed as NUM/DEN, and if that DEN is equal to 0 (within
0.00001), then PR_CORR is set to NA.  I'd say the next step would be
to look at the corresponding SL1L2 line for one of the suspect
> cases, perhaps the one where TOTAL=50.  From that SL1L2 line, pick
out the values for N (i.e. TOTAL), FBAR, OBAR, FFBAR, and OOBAR.  Then
plug them into a calculator and compute that denominator
> listed above (in the line "v = ...").  If you end up with something
within 0.00001 of 0, then that's why you're getting an NA.  If not,
perhaps there's a problem somewhere.
>
> I suspect that you're original inclination was probably correct -
the values for SPFH being very close to zero are the likely culprit.
>
> John
>
> On 02/14/2011 09:31 AM, RAL HelpDesk {for jhenders at aer.com} wrote:
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348 >
>>
>> Hello John,
>>
>> Thanks for your analysis. Overall I agree and understand your
>> assessment, however, see my attachment that shows that there are
some
>> regions that have sizable values of TOTAL, but for which PR_CORR is
>> still NA. This is a screen capture of output from your extraction.
It is
>> these lines that trouble me. It seems (from my original file) that
these
>> NA values only showed up for SPFH.
>>
>> Incidentally, why would PR_CORR be NA for ANY regions that have
TOTAL > 1?
>>
>> Thanks.
>>
>> John
>>
>> On 2/14/11 11:24 AM, RAL HelpDesk {for John Halley Gotway} wrote:
>>> John,
>>>
>>> I took a look at the data you sent and am not surprised by the
behavior you're seeing.  When statistics are computed over a very
small number of matched pairs (TOTAL column in your output), many
>>> statistics end up being undefined (NA).  I ran the following
command line to extract out just the "TOTAL" and "PR_CORR" columns
from the file you sent me:
>>>     cat point_stat_MULTIHOUR_960000L_20070718_120000V_cnt.txt |
sed -r 's/ +/ /g' | cut -d' ' -f22,43
>>>
>>> The output of that command is attached.  Take a look and you'll
notice that PR_CORR is NA only when the number of points is small
(TOTAL<  5).  If you're interested in evaluating the performance at
>>> these various pressure levels, I'd suggest running the evaluation
for a large number of forecasts and then use the STAT-Analysis tool to
aggregate the results through time.
>>>
>>> For example, consider the following STAT-Analysis job:
>>>     stat_analysis -lookin out/point_stat -job aggregate_stat
-line_type SL1L2 -out_line_type CNT -fcst_var RH -fcst_lev P300
-vx_mask AL -fcst_lead 96 -dump_row dump.stat
>>>
>>> This job will look through all the files ending in ".stat" found
in "out/point_stat".  It'll filter the lines and keep only the SL1L2
lines that are for the RH at P300 over the "AL" masking area with
>>> a lead time of 96 hours.  It'll sum those SL1L2 lines and compute
continuous statistics for them (including PR_CORR).  It will also
write out the matching stat lines to the file "dump.stat".  It's
>>> always a good idea to do this to make sure that your analysis job
is operating over the subset of data that you intended.
>>>
>>> Generally speaking, when the "TOTAL" is too small to compute
statistics, you need to perform some sort of aggregation to make the
TOTAL value bigger.
>>>
>>> Hope that helps.
>>>
>>> John Halley Gotway
>>> met_help at ucar.edu
>>>
>>> On 02/11/2011 05:20 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
>>>> Fri Feb 11 17:20:36 2011: Request 44348 was acted upon.
>>>> Transaction: Ticket created by jhenders at aer.com
>>>>         Queue: met_help
>>>>       Subject: NA values for PR_CORR
>>>>         Owner: Nobody
>>>>    Requestors: jhenders at aer.com
>>>>        Status: new
>>>>   Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348>
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I have attached a point_stat file that shows many examples of NAs
>>>> showing up for PR_CORR for the field SPFH. I wonder if you can
explain
>>>> why there are so many NA values but only for SPFH even when the
obs
>>>> count is much greater than 1. Could it perhaps be related to the
very
>>>> small values of SPFH in the chosen units?
>>>>
>>>> Thanks. Any help in solving this puzzle would be most
appreciated.
>>>>
>>>> Sincerely,
>>>>
>>>> John Henderson
>>>>
>>>>
>>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: jhenders at aer.com
Time: Tue Feb 15 15:37:27 2011

John,

Thanks for making the bug fix. I am in the middle of a number of
stat_anal runs, so it'll be a while before I can test the code.
Everything suggests to me that your approach will likely correct the
problem, though. Unfortunately I'll have to tell a client that there
was
a problem that caused some correlation values to be missing, but,
correct me if I'm wrong, the values that *are* there should be just
fine.

John

On 2/14/11 12:30 PM, RAL HelpDesk {for John Halley Gotway} wrote:
> John,
>
> So it's probably the case that our checking how to determine what is
zero is too stringent for SPFH, considering anything with 0.00001 of 0
to actually be zero.  Could you try something out for us?
>
> I tightened up the requirements a bit to require values to be within
10E-16 to be considered zero.  Please copy the attached file into
METv3.0/lib/vx_math/is_bad_data.h.  Then recompile MET, being
> sure to do a "make clean" first.  Try rerunning your case and let me
know if you end up with values where you had NA's before.
>
> If you'd like to try other values, you can easily see/modify where
I've set the "default_tol = 10E-16" value.
>
> In earlier versions of MET, when computing stats as a ratio of
NUM/DEN, we were only checking if the denominator was strictly equal
to zero.  This led to roundoff problems because we had some
> denominators that were really close, but not exactly equal to zero.
Perhaps our choice of a tolerance value of 10E-5 was too generous.
>
> Thanks,
> John
>
>
> On 02/14/2011 10:01 AM, RAL HelpDesk {for John Halley Gotway} wrote:
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348>
>>
>> John,
>>
>> PR_CORR is computed on line 1029-1044 of the file
METv3.0/lib/vx_met_util/met_stats.cc.  Here's that section of code:
>>
>>     // Compute correlation coefficient
>>     v =  (cnt_info.n*cnt_info.ffbar*cnt_info.n
>>         - cnt_info.fbar.v*cnt_info.n*cnt_info.fbar.v*cnt_info.n)
>>          *
>>          (cnt_info.n*cnt_info.oobar*cnt_info.n
>>         - cnt_info.obar.v*cnt_info.n*cnt_info.obar.v*cnt_info.n);
>>
>>     if(v<  0 || is_eq(v, 0.0)) {
>>        cnt_info.pr_corr.v = bad_data_double;
>>     }
>>     else {
>>        den = sqrt(v);
>>        cnt_info.pr_corr.v = (
(cnt_info.n*cnt_info.fobar*cnt_info.n)
>>                              -
(cnt_info.fbar.v*cnt_info.n*cnt_info.obar.v*cnt_info.n))
>>                             /den;
>>     }
>>
>> It's computed as NUM/DEN, and if that DEN is equal to 0 (within
0.00001), then PR_CORR is set to NA.  I'd say the next step would be
to look at the corresponding SL1L2 line for one of the suspect
>> cases, perhaps the one where TOTAL=50.  From that SL1L2 line, pick
out the values for N (i.e. TOTAL), FBAR, OBAR, FFBAR, and OOBAR.  Then
plug them into a calculator and compute that denominator
>> listed above (in the line "v = ...").  If you end up with something
within 0.00001 of 0, then that's why you're getting an NA.  If not,
perhaps there's a problem somewhere.
>>
>> I suspect that you're original inclination was probably correct -
the values for SPFH being very close to zero are the likely culprit.
>>
>> John
>>
>> On 02/14/2011 09:31 AM, RAL HelpDesk {for jhenders at aer.com} wrote:
>>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348>
>>>
>>> Hello John,
>>>
>>> Thanks for your analysis. Overall I agree and understand your
>>> assessment, however, see my attachment that shows that there are
some
>>> regions that have sizable values of TOTAL, but for which PR_CORR
is
>>> still NA. This is a screen capture of output from your extraction.
It is
>>> these lines that trouble me. It seems (from my original file) that
these
>>> NA values only showed up for SPFH.
>>>
>>> Incidentally, why would PR_CORR be NA for ANY regions that have
TOTAL>  1?
>>>
>>> Thanks.
>>>
>>> John
>>>
>>> On 2/14/11 11:24 AM, RAL HelpDesk {for John Halley Gotway} wrote:
>>>> John,
>>>>
>>>> I took a look at the data you sent and am not surprised by the
behavior you're seeing.  When statistics are computed over a very
small number of matched pairs (TOTAL column in your output), many
>>>> statistics end up being undefined (NA).  I ran the following
command line to extract out just the "TOTAL" and "PR_CORR" columns
from the file you sent me:
>>>>      cat point_stat_MULTIHOUR_960000L_20070718_120000V_cnt.txt |
sed -r 's/ +/ /g' | cut -d' ' -f22,43
>>>>
>>>> The output of that command is attached.  Take a look and you'll
notice that PR_CORR is NA only when the number of points is small
(TOTAL<   5).  If you're interested in evaluating the performance at
>>>> these various pressure levels, I'd suggest running the evaluation
for a large number of forecasts and then use the STAT-Analysis tool to
aggregate the results through time.
>>>>
>>>> For example, consider the following STAT-Analysis job:
>>>>      stat_analysis -lookin out/point_stat -job aggregate_stat
-line_type SL1L2 -out_line_type CNT -fcst_var RH -fcst_lev P300
-vx_mask AL -fcst_lead 96 -dump_row dump.stat
>>>>
>>>> This job will look through all the files ending in ".stat" found
in "out/point_stat".  It'll filter the lines and keep only the SL1L2
lines that are for the RH at P300 over the "AL" masking area with
>>>> a lead time of 96 hours.  It'll sum those SL1L2 lines and compute
continuous statistics for them (including PR_CORR).  It will also
write out the matching stat lines to the file "dump.stat".  It's
>>>> always a good idea to do this to make sure that your analysis job
is operating over the subset of data that you intended.
>>>>
>>>> Generally speaking, when the "TOTAL" is too small to compute
statistics, you need to perform some sort of aggregation to make the
TOTAL value bigger.
>>>>
>>>> Hope that helps.
>>>>
>>>> John Halley Gotway
>>>> met_help at ucar.edu
>>>>
>>>> On 02/11/2011 05:20 PM, RAL HelpDesk {for jhenders at aer.com}
wrote:
>>>>> Fri Feb 11 17:20:36 2011: Request 44348 was acted upon.
>>>>> Transaction: Ticket created by jhenders at aer.com
>>>>>          Queue: met_help
>>>>>        Subject: NA values for PR_CORR
>>>>>          Owner: Nobody
>>>>>     Requestors: jhenders at aer.com
>>>>>         Status: new
>>>>>    Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348>
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I have attached a point_stat file that shows many examples of
NAs
>>>>> showing up for PR_CORR for the field SPFH. I wonder if you can
explain
>>>>> why there are so many NA values but only for SPFH even when the
obs
>>>>> count is much greater than 1. Could it perhaps be related to the
very
>>>>> small values of SPFH in the chosen units?
>>>>>
>>>>> Thanks. Any help in solving this puzzle would be most
appreciated.
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> John Henderson
>>>>>
>>>>>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: John Halley Gotway
Time: Wed Feb 16 09:59:27 2011

John,

Yes, the values for PR_CORR that are there for SPFH should be fine.

However, I want to point out a bug we just uncovered yesterday.  It's
a simple thing with potentially a big impact.  It affects the output
of the PB2NC tool for upper-air observations.  Please see the
issue dated 02/15/2011:
   http://www.dtcenter.org/met/users/support/known_issues/METv3.0/index.php

John

On 02/15/2011 03:37 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
> to be within 10E-16 to be considered zero.  Please copy the attached
file into METv3.0/lib/vx_math/is_bad_data.h.  Then recompile MET,
being
>> sure to do a "make clean" first.  Try rerunning your case and let
me know if you end up with values where you had NA's before.
>>
>> If you'd like to try other values, you can easily see/modify where
I've set the "default_tol = 10E-16" value.
>>
>> In earlier versions of MET, when computing stats as a ratio of
NUM/DEN, we were only checking if the denominator was strictly equal
to zero.  This led to roundoff problems because we had some
>> denominators that were really close, but not exactly equal to zero.
Perhaps our choice of a tolerance value of 10E-5 was too generous.
>>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: jhenders at aer.com
Time: Wed Feb 16 10:08:28 2011

Hi John,

Thanks for the heads-up. Before I investigate, I should remind you
that
I still am using v2.0. I hope that our recent conversation still
applies
to v2.0.

John

On 2/16/11 11:59 AM, RAL HelpDesk {for John Halley Gotway} wrote:
> John,
>
> Yes, the values for PR_CORR that are there for SPFH should be fine.
>
> However, I want to point out a bug we just uncovered yesterday.
It's a simple thing with potentially a big impact.  It affects the
output of the PB2NC tool for upper-air observations.  Please see the
> issue dated 02/15/2011:
>
http://www.dtcenter.org/met/users/support/known_issues/METv3.0/index.php
>
> John
>
> On 02/15/2011 03:37 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
>> to be within 10E-16 to be considered zero.  Please copy the
attached file into METv3.0/lib/vx_math/is_bad_data.h.  Then recompile
MET, being
>>> sure to do a "make clean" first.  Try rerunning your case and let
me know if you end up with values where you had NA's before.
>>>
>>> If you'd like to try other values, you can easily see/modify where
I've set the "default_tol = 10E-16" value.
>>>
>>> In earlier versions of MET, when computing stats as a ratio of
NUM/DEN, we were only checking if the denominator was strictly equal
to zero.  This led to roundoff problems because we had some
>>> denominators that were really close, but not exactly equal to
zero.  Perhaps our choice of a tolerance value of 10E-5 was too
generous.
>>>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: jhenders at aer.com
Time: Wed Feb 16 10:24:37 2011

John,

Does the bug report from 2/15/2011 suggest that ALL v2.0 point_stat
output files have been using the least reliable set of observations?
In
other words, every point_stat run that I have completed to-date must
be
redone?

John

On 2/16/11 11:59 AM, RAL HelpDesk {for John Halley Gotway} wrote:
> John,
>
> Yes, the values for PR_CORR that are there for SPFH should be fine.
>
> However, I want to point out a bug we just uncovered yesterday.
It's a simple thing with potentially a big impact.  It affects the
output of the PB2NC tool for upper-air observations.  Please see the
> issue dated 02/15/2011:
>
http://www.dtcenter.org/met/users/support/known_issues/METv3.0/index.php
>
> John
>
> On 02/15/2011 03:37 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
>> to be within 10E-16 to be considered zero.  Please copy the
attached file into METv3.0/lib/vx_math/is_bad_data.h.  Then recompile
MET, being
>>> sure to do a "make clean" first.  Try rerunning your case and let
me know if you end up with values where you had NA's before.
>>>
>>> If you'd like to try other values, you can easily see/modify where
I've set the "default_tol = 10E-16" value.
>>>
>>> In earlier versions of MET, when computing stats as a ratio of
NUM/DEN, we were only checking if the denominator was strictly equal
to zero.  This led to roundoff problems because we had some
>>> denominators that were really close, but not exactly equal to
zero.  Perhaps our choice of a tolerance value of 10E-5 was too
generous.
>>>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: jhenders at aer.com
Time: Wed Feb 16 10:25:41 2011

I should also have added that the bug report/fix on the website does
not
*exclude* surface observations from being affected. Please clarify.

John

On 2/16/11 11:59 AM, RAL HelpDesk {for John Halley Gotway} wrote:
> John,
>
> Yes, the values for PR_CORR that are there for SPFH should be fine.
>
> However, I want to point out a bug we just uncovered yesterday.
It's a simple thing with potentially a big impact.  It affects the
output of the PB2NC tool for upper-air observations.  Please see the
> issue dated 02/15/2011:
>
http://www.dtcenter.org/met/users/support/known_issues/METv3.0/index.php
>
> John
>
> On 02/15/2011 03:37 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
>> to be within 10E-16 to be considered zero.  Please copy the
attached file into METv3.0/lib/vx_math/is_bad_data.h.  Then recompile
MET, being
>>> sure to do a "make clean" first.  Try rerunning your case and let
me know if you end up with values where you had NA's before.
>>>
>>> If you'd like to try other values, you can easily see/modify where
I've set the "default_tol = 10E-16" value.
>>>
>>> In earlier versions of MET, when computing stats as a ratio of
NUM/DEN, we were only checking if the denominator was strictly equal
to zero.  This led to roundoff problems because we had some
>>> denominators that were really close, but not exactly equal to
zero.  Perhaps our choice of a tolerance value of 10E-5 was too
generous.
>>>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: John Halley Gotway
Time: Wed Feb 16 10:44:08 2011

John,

I'm in the process of investigating this further.  I've retrieved a
global (GDAS) PREPBUFR file and have run it two ways - once with
event_stack_flag = 1 and once with it = 0.  I'm in the process of
looking at the differences in the observations to see which types of
observations do get "updated" and therefore would be different from
the two runs.  When I get more info on it, I'll let you know.

In my initial look at it, I'd seen that only the upper-air
observations were getting updated.  But I want to take a closer look
at the global data first.

Once we get it all straightened out, I suspect that we'll send out an
email to all the registered MET users about this.

Regarding reruns, I would say that it'd *probably* be a good idea.  Do
keep in mind though that quality mark threshold was still being
applied.  For example, if you kept it at it's default value of 2,
that means that PB2NC was only retaining observations whose quality
mark was 1 or 2.  However, due to this bug, for those observations
that were updated once and had quality marks values of 1 and 2 -
the value corresponding to the quality mark of 2 was being used -
rather than the more recent, updated value.

You may also choose to rerun a subset of your data to see what impact
the fix has on the results you're seeing.

I'm surprised this bug has gone undetected for so long, and I
apologize for the negative impact it has.  We'll probably be rerunning
a couple of the test we've done here in the DTC that looked at
upper-air verification.

I have not posted a bugfix for this issue for METv2.0 yet, but I can
do so if you'd like.  However, if you do decide to rerun things, you
could consider upgrading to METv3.0 at that point.

John



On 02/16/2011 10:25 AM, RAL HelpDesk {for jhenders at aer.com} wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348 >
>
> I should also have added that the bug report/fix on the website does
not
> *exclude* surface observations from being affected. Please clarify.
>
> John
>
> On 2/16/11 11:59 AM, RAL HelpDesk {for John Halley Gotway} wrote:
>> John,
>>
>> Yes, the values for PR_CORR that are there for SPFH should be fine.
>>
>> However, I want to point out a bug we just uncovered yesterday.
It's a simple thing with potentially a big impact.  It affects the
output of the PB2NC tool for upper-air observations.  Please see the
>> issue dated 02/15/2011:
>>
http://www.dtcenter.org/met/users/support/known_issues/METv3.0/index.php
>>
>> John
>>
>> On 02/15/2011 03:37 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
>>> to be within 10E-16 to be considered zero.  Please copy the
attached file into METv3.0/lib/vx_math/is_bad_data.h.  Then recompile
MET, being
>>>> sure to do a "make clean" first.  Try rerunning your case and let
me know if you end up with values where you had NA's before.
>>>>
>>>> If you'd like to try other values, you can easily see/modify
where I've set the "default_tol = 10E-16" value.
>>>>
>>>> In earlier versions of MET, when computing stats as a ratio of
NUM/DEN, we were only checking if the denominator was strictly equal
to zero.  This led to roundoff problems because we had some
>>>> denominators that were really close, but not exactly equal to
zero.  Perhaps our choice of a tolerance value of 10E-5 was too
generous.
>>>>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: jhenders at aer.com
Time: Wed Feb 16 10:52:19 2011

Hi again John,

Yes, I would appreciate a v2.0 bug fix being posted.

I don't really have the resources to rerun all my data, so I'm hoping
that the effects are minimal. I am happy to hear that while I may have
been using the non-updated obs, the quality flags are still being
applied. I think that should minimize the overall effects.

I'm not familiar at all with how the Prepbufr format 'updates' obs.
Are
many obs modified? This concept, it turns out, was asked about by a
client in regards to calm observations. Since models do not generally
report absolutely calm air, but obs often are absolutely calm, there
can
be a bias in verifying winds at very low wind speeds. The question was
whether the massaging of observations in Prepbufr format by some
application of the assimilation background field would affect the
number
of calm observations.

Any comments on the above would be most appreciated.

Thanks.

John

On 2/16/11 12:44 PM, RAL HelpDesk {for John Halley Gotway} wrote:
> John,
>
> I'm in the process of investigating this further.  I've retrieved a
global (GDAS) PREPBUFR file and have run it two ways - once with
event_stack_flag = 1 and once with it = 0.  I'm in the process of
> looking at the differences in the observations to see which types of
observations do get "updated" and therefore would be different from
the two runs.  When I get more info on it, I'll let you know.
>
> In my initial look at it, I'd seen that only the upper-air
observations were getting updated.  But I want to take a closer look
at the global data first.
>
> Once we get it all straightened out, I suspect that we'll send out
an email to all the registered MET users about this.
>
> Regarding reruns, I would say that it'd *probably* be a good idea.
Do keep in mind though that quality mark threshold was still being
applied.  For example, if you kept it at it's default value of 2,
> that means that PB2NC was only retaining observations whose quality
mark was 1 or 2.  However, due to this bug, for those observations
that were updated once and had quality marks values of 1 and 2 -
> the value corresponding to the quality mark of 2 was being used -
rather than the more recent, updated value.
>
> You may also choose to rerun a subset of your data to see what
impact the fix has on the results you're seeing.
>
> I'm surprised this bug has gone undetected for so long, and I
apologize for the negative impact it has.  We'll probably be rerunning
a couple of the test we've done here in the DTC that looked at
> upper-air verification.
>
> I have not posted a bugfix for this issue for METv2.0 yet, but I can
do so if you'd like.  However, if you do decide to rerun things, you
could consider upgrading to METv3.0 at that point.
>
> John
>
>
>
> On 02/16/2011 10:25 AM, RAL HelpDesk {for jhenders at aer.com} wrote:
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348>
>>
>> I should also have added that the bug report/fix on the website
does not
>> *exclude* surface observations from being affected. Please clarify.
>>
>> John
>>
>> On 2/16/11 11:59 AM, RAL HelpDesk {for John Halley Gotway} wrote:
>>> John,
>>>
>>> Yes, the values for PR_CORR that are there for SPFH should be
fine.
>>>
>>> However, I want to point out a bug we just uncovered yesterday.
It's a simple thing with potentially a big impact.  It affects the
output of the PB2NC tool for upper-air observations.  Please see the
>>> issue dated 02/15/2011:
>>>
http://www.dtcenter.org/met/users/support/known_issues/METv3.0/index.php
>>>
>>> John
>>>
>>> On 02/15/2011 03:37 PM, RAL HelpDesk {for jhenders at aer.com} wrote:
>>>> to be within 10E-16 to be considered zero.  Please copy the
attached file into METv3.0/lib/vx_math/is_bad_data.h.  Then recompile
MET, being
>>>>> sure to do a "make clean" first.  Try rerunning your case and
let me know if you end up with values where you had NA's before.
>>>>>
>>>>> If you'd like to try other values, you can easily see/modify
where I've set the "default_tol = 10E-16" value.
>>>>>
>>>>> In earlier versions of MET, when computing stats as a ratio of
NUM/DEN, we were only checking if the denominator was strictly equal
to zero.  This led to roundoff problems because we had some
>>>>> denominators that were really close, but not exactly equal to
zero.  Perhaps our choice of a tolerance value of 10E-5 was too
generous.
>>>>>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: John Halley Gotway
Time: Thu Feb 17 16:43:44 2011

John,

Sorry for the delay in getting back with you.  A bugfix for METv2.0 is
now posted and available here:
   http://www.dtcenter.org/met/users/support/known_issues/METv2.0/index.php

To get an idea of what observations this bug impacts, I ran 3 days
worth of GDAS (global) PREPBUFR file and 3 days worth of NDAS (north
america) PREPBUFR files through PB2NC pre- and post- bugfix.  I
took a look at the differences and found the following:
- Each GDAS file (6-hour chunk) contains between 1.2 and 1.6 million
observations, while each NDAS file (also 6 hours) contains between 250
- 400 thousand observations.
- Of those observations, 4-5% of the GDAS obs and 6-7% of those NDAS
obs are different before/after the bugfix.
- Of those observations that get updated, about 95% of them are for
the ADPSFC message type.  The SFCSHP message type accounts for 4% and
the remaining 1% are split between ADPUPA, SYNDAT, and VADWND.
- Of those observations that get updated, about 90% of them are for
temperature, 6-8% are for specific humidity, and the remainder are
split across the other variable types.  In only 1 GDAS file did I
see U and V winds get updated.
- I looked at a spatial distribution of where the difference occur,
and there's no obvious pattern.  The differences are spread across the
map.

I also ran a NDAS file through a debugger and dumped out more
information about how observations are updated.  The updates are
roughly split into the following two categories:
(1) Update the observation value but leave the quality mark unchanged.
(2) Update the quality mark but leave the observation value unchanged.
In this case it's used to mark observations as bad - generally when
the quality mark is changed, it is set to a worse value.

Overall, I've found that if you apply the patch and keep the quality
mark threshold set a 2, you'll actually see a reduction in the number
of observations being used.  This is because of (2) listed
above, several of the observations that were originally marked as good
were updated to being marked as bad - and therefore, the patched code
no longer uses them.

I do think it'd be good to run a subset of your data through the
patched code to see what impact it has.

Hope this helps.

John


On 02/16/2011 10:52 AM, RAL HelpDesk {for jhenders at aer.com} wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348 >
>
> Hi again John,
>
> Yes, I would appreciate a v2.0 bug fix being posted.
>
> I don't really have the resources to rerun all my data, so I'm
hoping
> that the effects are minimal. I am happy to hear that while I may
have
> been using the non-updated obs, the quality flags are still being
> applied. I think that should minimize the overall effects.
>
> I'm not familiar at all with how the Prepbufr format 'updates' obs.
Are
> many obs modified? This concept, it turns out, was asked about by a
> client in regards to calm observations. Since models do not
generally
> report absolutely calm air, but obs often are absolutely calm, there
can
> be a bias in verifying winds at very low wind speeds. The question
was
> whether the massaging of observations in Prepbufr format by some
> application of the assimilation background field would affect the
number
> of calm observations.
>
> Any comments on the above would be most appreciated.
>
> Thanks.
>
> John
>
> On 2/16/11 12:44 PM, RAL HelpDesk {for John Halley Gotway} wrote:
>> John,
>>
>> I'm in the process of investigating this further.  I've retrieved a
global (GDAS) PREPBUFR file and have run it two ways - once with
event_stack_flag = 1 and once with it = 0.  I'm in the process of
>> looking at the differences in the observations to see which types
of observations do get "updated" and therefore would be different from
the two runs.  When I get more info on it, I'll let you know.
>>
>> In my initial look at it, I'd seen that only the upper-air
observations were getting updated.  But I want to take a closer look
at the global data first.
>>
>> Once we get it all straightened out, I suspect that we'll send out
an email to all the registered MET users about this.
>>
>> Regarding reruns, I would say that it'd *probably* be a good idea.
Do keep in mind though that quality mark threshold was still being
applied.  For example, if you kept it at it's default value of 2,
>> that means that PB2NC was only retaining observations whose quality
mark was 1 or 2.  However, due to this bug, for those observations
that were updated once and had quality marks values of 1 and 2 -
>> the value corresponding to the quality mark of 2 was being used -
rather than the more recent, updated value.
>>
>> You may also choose to rerun a subset of your data to see what
impact the fix has on the results you're seeing.
>>
>> I'm surprised this bug has gone undetected for so long, and I
apologize for the negative impact it has.  We'll probably be rerunning
a couple of the test we've done here in the DTC that looked at
>> upper-air verification.
>>
>> I have not posted a bugfix for this issue for METv2.0 yet, but I
can do so if you'd like.  However, if you do decide to rerun things,
you could consider upgrading to METv3.0 at that point.
>>
>> John
>>
>>
>>
>> On 02/16/2011 10:25 AM, RAL HelpDesk {for jhenders at aer.com} wrote:
>>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348>
>>>
>>> I should also have added that the bug report/fix on the website
does not
>>> *exclude* surface observations from being affected. Please
clarify.
>>>
>>> John
>>>
>>> On 2/16/11 11:59 AM, RAL HelpDesk {for John Halley Gotway} wrote:
>>>> John,
>>>>
>>>> Yes, the values for PR_CORR that are there for SPFH should be
fine.
>>>>
>>>> However, I want to point out a bug we just uncovered yesterday.
It's a simple thing with potentially a big impact.  It affects the
output of the PB2NC tool for upper-air observations.  Please see the
>>>> issue dated 02/15/2011:
>>>>
http://www.dtcenter.org/met/users/support/known_issues/METv3.0/index.php
>>>>
>>>> John
>>>>
>>>> On 02/15/2011 03:37 PM, RAL HelpDesk {for jhenders at aer.com}
wrote:
>>>>> to be within 10E-16 to be considered zero.  Please copy the
attached file into METv3.0/lib/vx_math/is_bad_data.h.  Then recompile
MET, being
>>>>>> sure to do a "make clean" first.  Try rerunning your case and
let me know if you end up with values where you had NA's before.
>>>>>>
>>>>>> If you'd like to try other values, you can easily see/modify
where I've set the "default_tol = 10E-16" value.
>>>>>>
>>>>>> In earlier versions of MET, when computing stats as a ratio of
NUM/DEN, we were only checking if the denominator was strictly equal
to zero.  This led to roundoff problems because we had some
>>>>>> denominators that were really close, but not exactly equal to
zero.  Perhaps our choice of a tolerance value of 10E-5 was too
generous.
>>>>>>

------------------------------------------------
Subject: NA values for PR_CORR
From: jhenders at aer.com
Time: Thu Feb 17 18:34:00 2011

Thank you for the detailed investigation. When I get a chance, I may
rerun some tests. I will apply the patch soon, also.

Do you have any further documentation about the way in which Prepbufr
format changes the values? I've only ever found one page (at NCEP, I
think??), but it doesn't delve into the details.


Thanks.


John


------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #44348] NA values for PR_CORR
From: John Halley Gotway
Time: Thu Feb 17 22:43:25 2011

John,

Unfortunately no, I don't.  I can point you to some NCEP documentation
about PREPBUFR processing:
   http://www.emc.ncep.noaa.gov/mmb/data_processing/prepbufr.doc/document.htm

And here's a table they reference that contains interpretations for
the
quality marker values they use (from 0 to 15):
   http://www.emc.ncep.noaa.gov/mmb/data_processing/prepbufr.doc/table_7.htm

And I can tell you that we typically use observations with a quality
marker of 2 or better (lower is better).

John

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=44348 >
>
> Thank you for the detailed investigation. When I get a chance, I may
rerun
> some tests. I will apply the patch soon, also.
>
> Do you have any further documentation about the way in which
Prepbufr
> format changes the values? I've only ever found one page (at NCEP, I
> think??), but it doesn't delve into the details.
>
>
> Thanks.
>
>
> John
>
>



------------------------------------------------


More information about the Met_help mailing list