[Met_help] [rt.rap.ucar.edu #52626] History for RE: Help with ds337.0

John Halley Gotway via RT met_help at ucar.edu
Tue Apr 10 13:58:35 MDT 2012


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Hi Tom/MET help,

Thanks for the fantastically quick reply, Tom!

It turns out that I'm specifically referring to the netcdf output from the pb2nc program.

I already sent a help ticket to the MET team, asking if they have a means for removing the duplicate obs from their PB2NC process.  At the time, they didn't refer to the history of data as it undergoes QC, so this might help me track down the reason for the duplicate obs.  So, I have CC'd the met_help to this email.

It turns out that when I initially ran pb2nc with the default quality control flag set to "2" (i.e. quality_mark_thresh in the PB2NCConfig_default file), I did not get ANY surface observations in my final netcdf file over Central America.  Upon email exchanges with the MET team, it was recommended that I set the quality control flag to "9" to be able to accept more observations into the netcdf outfile.

>From what it sounds like, I need to better understand what the "happy medium" should be in setting the quality_mark_thresh flag in pb2nc.  2 is too restrictive, while 9 appears to be allowing duplicate observations into the mix as a result of the QC process.

Any recommendations are greatly welcome!

Thanks much,
Jonathan


From: Thomas Cram [mailto:tcram at ucar.edu]
Sent: Friday, January 13, 2012 4:39 PM
To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
Subject: Re: Help with ds337.0

Hi Jonathan,

the only experience I have working with the MET software is using the pb2nc utility to convert PREPBUFR observations into a NetCDF dataset, so my knowledge of MET is limited.  However, the one reason I can think of for the duplicate observations is that you're seeing the same observation after several stages of quality-control pre-processing.  The PREPBUFR files contain a complete history of the data as it's modified during QC, so each station will have multiple reports at a single time.  There's a quality control flag appended to each PREPBUFR message; you want to keep the observation with the lowest QC number.

Can you send me the date and time for the examples you list below?  I'll take a look at the PREPBUFR messages and see if this is the case.

If this doesn't explain it, then I'll forward your question on to MET support desk and see if they know the reason for duplicate observations.  They are intimately familiar with the PREPBUFR obs, so I'm sure they can help you out.

- Tom

On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO INC] wrote:


Dear Thomas,

This is Jonathan Case of the NASA SPoRT Center (http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
I am conducting some weather model verification using the MET verification software (NCAR's Meteorological Evaluation Tools) and the NCEP GDAS PREPBUFR point observation files for ground truth.  I have accessed archived GDAS PREPBUFR files from NCAR's repository at http://dss.ucar.edu/datasets/ds337.0/ and began producing difference stats over Central America between the model forecast and observations obtained from the PREPBUFR files.

Now here is the interesting part:  When I examined the textual difference files generated by the MET software, I noticed that there were several stations with "duplicate" observations that led to duplicate forecast-observation difference pairs.  I put duplicate in quotes because the observed values were not necessarily the same but usually very close to one another.
The duplicate observations arose from the fact that at the same observation location, there would be a 5-digit WMO identifier as well as a 4-digit text station ID at a given hour.
I stumbled on these duplicate station data when I made a table of stations and mapped them, revealing the duplicates.

Some examples I stumbled on include:
*         78720/MHTG (both at 14.05N, -87.22E)
*         78641/MGGT (both at 14.58N, -90.52E)
*         78711/MHPL (both at 15.22N, -83.80E)
*         78708/MHLM (both at 15.45N, -87.93)

There are others, but I thought I'd provide a few examples to start.

If the source of the duplicates is NCEP/EMC, I wonder if it would be helpful to send them a note as well?

Let me know how you would like to proceed.

Most sincerely,
Jonathan

--------------------------------------------------------------------------------------------------
Jonathan Case, ENSCO Inc.
NASA Short-term Prediction Research and Transition Center (aka SPoRT Center)
320 Sparkman Drive, Room 3062
Huntsville, AL 35805
Voice: 256.961.7504
Fax: 256.961.7788
Emails: Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov> / case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
--------------------------------------------------------------------------------------------------

"Whether the weather is cold, or whether the weather is hot, we'll weather
  the weather whether we like it or not!"


Thomas Cram
NCAR / CISL / DSS
303-497-1217
tcram at ucar.edu<mailto:tcram at ucar.edu>





----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: RE: Help with ds337.0
From: John Halley Gotway
Time: Thu Jan 19 12:00:57 2012

Jonathan,

I apologize for the long delay in getting back to you on this.  We've
been scrambling over the last couple of weeks to finish up development
on a new release.  Here's my recollection of what's going
on with this issue:

   - You're using the GDAS PrepBUFR observation dataset, but you're
finding that PB2NC retains very few ADPSFC observations when you a
quality marker of 2.
   - We advised via MET-Help that the algorithm employed by NCEP in
the GDAS processing sets most ADPSFC observations' quality marker to a
value of 9.  NCEP does that to prevent those observations
from being used in the data assimilation.  So the use of quality
marker = 9 is more an artifact of the data assimilation process and
not really saying anything about the quality of those observations.
   - When you switch to using a quality marker = 9 in PB2NC, you got
many matches, but ended up with more "duplicate" observations.

So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?

I did some investigation on this issue this morning.  Here's what I
did:

- Retrieved this file:
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/gdas1.t12z.prepbufr.nr
- Ran it through PB2NC from message type = ADPSFC, time window = +/- 0
seconds, and quality markers of 2 and 9.
- For both, I used the updated version of the plot_point_obs tool to
create a plot of the data and dump header information about the points
being plotted.
- I also used the -dump option for PB2NC to dump all of the ADPSFC
observations to ASCII format.

I've attached several things to this message:
- The postscript output of plot_point_obs for qm = 2 and qm = 9m,
after first converting to png format.
- The output from the plot_point_obs tool for both runs.

For qm=2, there were 51 locations plotted in your domain.
   - Of those 51...
      - All 51 header entries are unique.
      - There are only 36 unique combinations of lat/lon.
For qm=9, there were 101 locations plotted in your domain.
   - Of those 101...
      - There are only 52 unique header entries.
      - There are only 37 unique combinations of lat/lon.

I think there are two issues occurring here:

(1) When using qm=2, you'll often see two observing locations that
look the same except for the station ID.  For example:
  [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
  [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]

I looked at the observations that correspond to these and found that
they do actually differ slightly.

(2) The second, larger issue here is when using qm=9.  It does appear
that we're really getting duplicate observations.  Foe example:
  [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
  [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]

This will likely require further debugging of the PB2NC tool to figure
out what's going on.

I just wanted to let you know what I've found so far.

Thanks,
John Halley Gotway



On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>
> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>         Queue: met_help
>       Subject: RE: Help with ds337.0
>         Owner: Nobody
>    Requestors: jonathan.case-1 at nasa.gov
>        Status: new
>   Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
>
> Hi Tom/MET help,
>
> Thanks for the fantastically quick reply, Tom!
>
> It turns out that I'm specifically referring to the netcdf output
from the pb2nc program.
>
> I already sent a help ticket to the MET team, asking if they have a
means for removing the duplicate obs from their PB2NC process.  At the
time, they didn't refer to the history of data as it undergoes QC, so
this might help me track down the reason for the duplicate obs.  So, I
have CC'd the met_help to this email.
>
> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>
>> From what it sounds like, I need to better understand what the
"happy medium" should be in setting the quality_mark_thresh flag in
pb2nc.  2 is too restrictive, while 9 appears to be allowing duplicate
observations into the mix as a result of the QC process.
>
> Any recommendations are greatly welcome!
>
> Thanks much,
> Jonathan
>
>
> From: Thomas Cram [mailto:tcram at ucar.edu]
> Sent: Friday, January 13, 2012 4:39 PM
> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
> Subject: Re: Help with ds337.0
>
> Hi Jonathan,
>
> the only experience I have working with the MET software is using
the pb2nc utility to convert PREPBUFR observations into a NetCDF
dataset, so my knowledge of MET is limited.  However, the one reason I
can think of for the duplicate observations is that you're seeing the
same observation after several stages of quality-control pre-
processing.  The PREPBUFR files contain a complete history of the data
as it's modified during QC, so each station will have multiple reports
at a single time.  There's a quality control flag appended to each
PREPBUFR message; you want to keep the observation with the lowest QC
number.
>
> Can you send me the date and time for the examples you list below?
I'll take a look at the PREPBUFR messages and see if this is the case.
>
> If this doesn't explain it, then I'll forward your question on to
MET support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>
> - Tom
>
> On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO INC]
wrote:
>
>
> Dear Thomas,
>
> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
> I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>
> Now here is the interesting part:  When I examined the textual
difference files generated by the MET software, I noticed that there
were several stations with "duplicate" observations that led to
duplicate forecast-observation difference pairs.  I put duplicate in
quotes because the observed values were not necessarily the same but
usually very close to one another.
> The duplicate observations arose from the fact that at the same
observation location, there would be a 5-digit WMO identifier as well
as a 4-digit text station ID at a given hour.
> I stumbled on these duplicate station data when I made a table of
stations and mapped them, revealing the duplicates.
>
> Some examples I stumbled on include:
> *         78720/MHTG (both at 14.05N, -87.22E)
> *         78641/MGGT (both at 14.58N, -90.52E)
> *         78711/MHPL (both at 15.22N, -83.80E)
> *         78708/MHLM (both at 15.45N, -87.93)
>
> There are others, but I thought I'd provide a few examples to start.
>
> If the source of the duplicates is NCEP/EMC, I wonder if it would be
helpful to send them a note as well?
>
> Let me know how you would like to proceed.
>
> Most sincerely,
> Jonathan
>
>
--------------------------------------------------------------------------------------------------
> Jonathan Case, ENSCO Inc.
> NASA Short-term Prediction Research and Transition Center (aka SPoRT
Center)
> 320 Sparkman Drive, Room 3062
> Huntsville, AL 35805
> Voice: 256.961.7504
> Fax: 256.961.7788
> Emails: Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>  /
case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>
--------------------------------------------------------------------------------------------------
>
> "Whether the weather is cold, or whether the weather is hot, we'll
weather
>    the weather whether we like it or not!"
>
>
> Thomas Cram
> NCAR / CISL / DSS
> 303-497-1217
> tcram at ucar.edu<mailto:tcram at ucar.edu>
>
>
>

------------------------------------------------
Subject: RE: Help with ds337.0
From: John Halley Gotway
Time: Thu Jan 19 12:00:57 2012

DEBUG 1: Retrieving grid from file: central_america.grib
DEBUG 1: Opening netCDF file: gdas1.t12z.prepbufr_qm2.nc
DEBUG 1: Processing 6156 observations at 6156 locations.
DEBUG 1: Observation GRIB codes: ALL
DEBUG 1: Observation message types: ADPSFC
DEBUG 1: Creating postscript file: gdas1.t12z.prepbufr_qm2.ps
DEBUG 3: [1] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, 78388, 20120112_120000, 18.5, -77.92, 3 ]
DEBUG 3: [2] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, SKLC, 20120112_120000, 7.8, -76.7, 30 ]
DEBUG 3: [3] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MROC, 20120112_120000, 10, -84.22, 931 ]
DEBUG 3: [4] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, SKSP, 20120112_120000, 12.58, -81.72, 2 ]
DEBUG 3: [5] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MGRT, 20120112_120000, 14.53, -91.67, 239 ]
DEBUG 3: [6] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MHLC, 20120112_120000, 15.73, -86.87, 3 ]
DEBUG 3: [7] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MHRO, 20120112_120000, 16.32, -86.53, 5 ]
DEBUG 3: [8] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MRLB, 20120112_120000, 10.6, -85.55, 93 ]
DEBUG 3: [9] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MNRS, 20120112_120000, 11.42, -85.83, 53 ]
DEBUG 3: [10] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHPL, 20120112_120000, 15.22, -83.8, 13 ]
DEBUG 3: [11] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MGSJ, 20120112_120000, 13.92, -90.82, 2 ]
DEBUG 3: [12] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPCH, 20120112_120000, 9.43, -82.52, 6 ]
DEBUG 3: [13] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MRLM, 20120112_120000, 10, -83.05, 4 ]
DEBUG 3: [14] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNJG, 20120112_120000, 13.08, -85.98, 985 ]
DEBUG 3: [15] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78397, 20120112_120000, 17.93, -76.78, 9 ]
DEBUG 3: [16] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPTO, 20120112_120000, 9.05, -79.37, 11 ]
DEBUG 3: [17] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHTG, 20120112_120000, 14.05, -87.22, 994 ]
DEBUG 3: [18] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MKJP, 20120112_120000, 17.93, -76.78, 9 ]
DEBUG 3: [19] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPMG, 20120112_120000, 8.98, -79.52, 13 ]
DEBUG 3: [20] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MRPV, 20120112_120000, 9.95, -84.15, 994 ]
DEBUG 3: [21] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNCH, 20120112_120000, 12.63, -87.13, 53 ]
DEBUG 3: [22] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MGGT, 20120112_120000, 14.58, -90.52, 1489 ]
DEBUG 3: [23] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
DEBUG 3: [24] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78793, 20120112_120000, 8.4, -82.42, 26 ]
DEBUG 3: [25] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78795, 20120112_120000, 8.08, -80.95, 88 ]
DEBUG 3: [26] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPBO, 20120112_120000, 9.35, -82.25, 3 ]
DEBUG 3: [27] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MGPB, 20120112_120000, 15.72, -88.6, 1 ]
DEBUG 3: [28] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78700, 20120112_120000, 13.28, -87.67, 5 ]
DEBUG 3: [29] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78703, 20120112_120000, 16.32, -86.53, 5 ]
DEBUG 3: [30] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78705, 20120112_120000, 15.73, -86.87, 3 ]
DEBUG 3: [31] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78706, 20120112_120000, 15.72, -87.48, 3 ]
DEBUG 3: [32] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78708, 20120112_120000, 15.45, -87.93, 31 ]
DEBUG 3: [33] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78711, 20120112_120000, 15.22, -83.8, 13 ]
DEBUG 3: [34] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78717, 20120112_120000, 14.78, -88.78, 1079 ]
DEBUG 3: [35] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78719, 20120112_120000, 14.33, -88.17, 1100 ]
DEBUG 3: [36] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78720, 20120112_120000, 14.05, -87.22, 994 ]
DEBUG 3: [37] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78583, 20120112_120000, 17.53, -88.3, 5 ]
DEBUG 3: [38] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MKJS, 20120112_120000, 18.5, -77.92, 3 ]
DEBUG 3: [39] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78767, 20120112_120000, 10, -83.05, 4 ]
DEBUG 3: [40] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHTE, 20120112_120000, 15.72, -87.48, 3 ]
DEBUG 3: [41] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MZBZ, 20120112_120000, 17.53, -88.3, 5 ]
DEBUG 3: [42] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78615, 20120112_120000, 16.92, -89.88, 115 ]
DEBUG 3: [43] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78627, 20120112_120000, 15.32, -91.47, 1901 ]
DEBUG 3: [44] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78637, 20120112_120000, 15.72, -88.6, 1 ]
DEBUG 3: [45] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78641, 20120112_120000, 14.58, -90.52, 1489 ]
DEBUG 3: [46] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78647, 20120112_120000, 13.92, -90.82, 2 ]
DEBUG 3: [47] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 80001, 20120112_120000, 12.58, -81.72, 2 ]
DEBUG 3: [48] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, SKMR, 20120112_120000, 8.82, -75.85, 26 ]
DEBUG 3: [49] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNJU, 20120112_120000, 12.1, -85.37, 90 ]
DEBUG 3: [50] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNMG, 20120112_120000, 12.15, -86.17, 50 ]
DEBUG 3: [51] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHLM, 20120112_120000, 15.45, -87.93, 31 ]
DEBUG 2: Finished plotting 51 locations.
DEBUG 2: Skipped 6105 locations off the grid.

------------------------------------------------
Subject: RE: Help with ds337.0
From: John Halley Gotway
Time: Thu Jan 19 12:00:57 2012

DEBUG 1: Retrieving grid from file: central_america.grib
DEBUG 1: Opening netCDF file: gdas1.t12z.prepbufr_qm9.nc
DEBUG 1: Processing 34256 observations at 13737 locations.
DEBUG 1: Observation GRIB codes: ALL
DEBUG 1: Observation message types: ADPSFC
DEBUG 1: Creating postscript file: gdas1.t12z.prepbufr_qm9.ps
DEBUG 3: [1] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, 78388, 20120112_120000, 18.5, -77.92, 3 ]
DEBUG 3: [2] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, SKLC, 20120112_120000, 7.8, -76.7, 30 ]
DEBUG 3: [3] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, SKLC, 20120112_120000, 7.8, -76.7, 30 ]
DEBUG 3: [4] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MROC, 20120112_120000, 10, -84.22, 931 ]
DEBUG 3: [5] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MROC, 20120112_120000, 10, -84.22, 931 ]
DEBUG 3: [6] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, SKSP, 20120112_120000, 12.58, -81.72, 2 ]
DEBUG 3: [7] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, SKSP, 20120112_120000, 12.58, -81.72, 2 ]
DEBUG 3: [8] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MGRT, 20120112_120000, 14.53, -91.67, 239 ]
DEBUG 3: [9] Plotting location [ type, sid, valid, lat, lon, elevation
] = [ ADPSFC, MGRT, 20120112_120000, 14.53, -91.67, 239 ]
DEBUG 3: [10] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHLC, 20120112_120000, 15.73, -86.87, 3 ]
DEBUG 3: [11] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHLC, 20120112_120000, 15.73, -86.87, 3 ]
DEBUG 3: [12] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHRO, 20120112_120000, 16.32, -86.53, 5 ]
DEBUG 3: [13] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHRO, 20120112_120000, 16.32, -86.53, 5 ]
DEBUG 3: [14] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MRLB, 20120112_120000, 10.6, -85.55, 93 ]
DEBUG 3: [15] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MRLB, 20120112_120000, 10.6, -85.55, 93 ]
DEBUG 3: [16] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNRS, 20120112_120000, 11.42, -85.83, 53 ]
DEBUG 3: [17] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNRS, 20120112_120000, 11.42, -85.83, 53 ]
DEBUG 3: [18] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHPL, 20120112_120000, 15.22, -83.8, 13 ]
DEBUG 3: [19] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHPL, 20120112_120000, 15.22, -83.8, 13 ]
DEBUG 3: [20] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78724, 20120112_120000, 13.3, -87.18, 48 ]
DEBUG 3: [21] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78724, 20120112_120000, 13.3, -87.18, 48 ]
DEBUG 3: [22] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MGSJ, 20120112_120000, 13.92, -90.82, 2 ]
DEBUG 3: [23] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MGSJ, 20120112_120000, 13.92, -90.82, 2 ]
DEBUG 3: [24] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPCH, 20120112_120000, 9.43, -82.52, 6 ]
DEBUG 3: [25] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPCH, 20120112_120000, 9.43, -82.52, 6 ]
DEBUG 3: [26] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MRLM, 20120112_120000, 10, -83.05, 4 ]
DEBUG 3: [27] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MRLM, 20120112_120000, 10, -83.05, 4 ]
DEBUG 3: [28] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNJG, 20120112_120000, 13.08, -85.98, 985 ]
DEBUG 3: [29] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNJG, 20120112_120000, 13.08, -85.98, 985 ]
DEBUG 3: [30] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78397, 20120112_120000, 17.93, -76.78, 9 ]
DEBUG 3: [31] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPTO, 20120112_120000, 9.05, -79.37, 11 ]
DEBUG 3: [32] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHTG, 20120112_120000, 14.05, -87.22, 994 ]
DEBUG 3: [33] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHTG, 20120112_120000, 14.05, -87.22, 994 ]
DEBUG 3: [34] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MKJP, 20120112_120000, 17.93, -76.78, 9 ]
DEBUG 3: [35] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MKJP, 20120112_120000, 17.93, -76.78, 9 ]
DEBUG 3: [36] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPMG, 20120112_120000, 8.98, -79.52, 13 ]
DEBUG 3: [37] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPMG, 20120112_120000, 8.98, -79.52, 13 ]
DEBUG 3: [38] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MRPV, 20120112_120000, 9.95, -84.15, 994 ]
DEBUG 3: [39] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MRPV, 20120112_120000, 9.95, -84.15, 994 ]
DEBUG 3: [40] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNCH, 20120112_120000, 12.63, -87.13, 53 ]
DEBUG 3: [41] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNCH, 20120112_120000, 12.63, -87.13, 53 ]
DEBUG 3: [42] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MGGT, 20120112_120000, 14.58, -90.52, 1489 ]
DEBUG 3: [43] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MGGT, 20120112_120000, 14.58, -90.52, 1489 ]
DEBUG 3: [44] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
DEBUG 3: [45] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
DEBUG 3: [46] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78793, 20120112_120000, 8.4, -82.42, 26 ]
DEBUG 3: [47] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78793, 20120112_120000, 8.4, -82.42, 26 ]
DEBUG 3: [48] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78795, 20120112_120000, 8.08, -80.95, 88 ]
DEBUG 3: [49] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78795, 20120112_120000, 8.08, -80.95, 88 ]
DEBUG 3: [50] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPBO, 20120112_120000, 9.35, -82.25, 3 ]
DEBUG 3: [51] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MPBO, 20120112_120000, 9.35, -82.25, 3 ]
DEBUG 3: [52] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MGPB, 20120112_120000, 15.72, -88.6, 1 ]
DEBUG 3: [53] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MGPB, 20120112_120000, 15.72, -88.6, 1 ]
DEBUG 3: [54] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78700, 20120112_120000, 13.28, -87.67, 5 ]
DEBUG 3: [55] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78700, 20120112_120000, 13.28, -87.67, 5 ]
DEBUG 3: [56] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78703, 20120112_120000, 16.32, -86.53, 5 ]
DEBUG 3: [57] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78703, 20120112_120000, 16.32, -86.53, 5 ]
DEBUG 3: [58] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78705, 20120112_120000, 15.73, -86.87, 3 ]
DEBUG 3: [59] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78705, 20120112_120000, 15.73, -86.87, 3 ]
DEBUG 3: [60] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78706, 20120112_120000, 15.72, -87.48, 3 ]
DEBUG 3: [61] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78706, 20120112_120000, 15.72, -87.48, 3 ]
DEBUG 3: [62] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78708, 20120112_120000, 15.45, -87.93, 31 ]
DEBUG 3: [63] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78708, 20120112_120000, 15.45, -87.93, 31 ]
DEBUG 3: [64] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78711, 20120112_120000, 15.22, -83.8, 13 ]
DEBUG 3: [65] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78711, 20120112_120000, 15.22, -83.8, 13 ]
DEBUG 3: [66] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78717, 20120112_120000, 14.78, -88.78, 1079 ]
DEBUG 3: [67] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78717, 20120112_120000, 14.78, -88.78, 1079 ]
DEBUG 3: [68] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78719, 20120112_120000, 14.33, -88.17, 1100 ]
DEBUG 3: [69] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78719, 20120112_120000, 14.33, -88.17, 1100 ]
DEBUG 3: [70] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78720, 20120112_120000, 14.05, -87.22, 994 ]
DEBUG 3: [71] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78720, 20120112_120000, 14.05, -87.22, 994 ]
DEBUG 3: [72] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78583, 20120112_120000, 17.53, -88.3, 5 ]
DEBUG 3: [73] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78583, 20120112_120000, 17.53, -88.3, 5 ]
DEBUG 3: [74] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MKJS, 20120112_120000, 18.5, -77.92, 3 ]
DEBUG 3: [75] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MKJS, 20120112_120000, 18.5, -77.92, 3 ]
DEBUG 3: [76] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78767, 20120112_120000, 10, -83.05, 4 ]
DEBUG 3: [77] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78767, 20120112_120000, 10, -83.05, 4 ]
DEBUG 3: [78] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHTE, 20120112_120000, 15.72, -87.48, 3 ]
DEBUG 3: [79] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHTE, 20120112_120000, 15.72, -87.48, 3 ]
DEBUG 3: [80] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MZBZ, 20120112_120000, 17.53, -88.3, 5 ]
DEBUG 3: [81] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MZBZ, 20120112_120000, 17.53, -88.3, 5 ]
DEBUG 3: [82] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78615, 20120112_120000, 16.92, -89.88, 115 ]
DEBUG 3: [83] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78615, 20120112_120000, 16.92, -89.88, 115 ]
DEBUG 3: [84] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78627, 20120112_120000, 15.32, -91.47, 1901 ]
DEBUG 3: [85] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78627, 20120112_120000, 15.32, -91.47, 1901 ]
DEBUG 3: [86] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78637, 20120112_120000, 15.72, -88.6, 1 ]
DEBUG 3: [87] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78637, 20120112_120000, 15.72, -88.6, 1 ]
DEBUG 3: [88] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78641, 20120112_120000, 14.58, -90.52, 1489 ]
DEBUG 3: [89] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78641, 20120112_120000, 14.58, -90.52, 1489 ]
DEBUG 3: [90] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78647, 20120112_120000, 13.92, -90.82, 2 ]
DEBUG 3: [91] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 78647, 20120112_120000, 13.92, -90.82, 2 ]
DEBUG 3: [92] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 80001, 20120112_120000, 12.58, -81.72, 2 ]
DEBUG 3: [93] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, 80001, 20120112_120000, 12.58, -81.72, 2 ]
DEBUG 3: [94] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, SKMR, 20120112_120000, 8.82, -75.85, 26 ]
DEBUG 3: [95] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, SKMR, 20120112_120000, 8.82, -75.85, 26 ]
DEBUG 3: [96] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNJU, 20120112_120000, 12.1, -85.37, 90 ]
DEBUG 3: [97] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNJU, 20120112_120000, 12.1, -85.37, 90 ]
DEBUG 3: [98] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNMG, 20120112_120000, 12.15, -86.17, 50 ]
DEBUG 3: [99] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MNMG, 20120112_120000, 12.15, -86.17, 50 ]
DEBUG 3: [100] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHLM, 20120112_120000, 15.45, -87.93, 31 ]
DEBUG 3: [101] Plotting location [ type, sid, valid, lat, lon,
elevation ] = [ ADPSFC, MHLM, 20120112_120000, 15.45, -87.93, 31 ]
DEBUG 2: Finished plotting 101 locations.
DEBUG 2: Skipped 13636 locations off the grid.

------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: Case, Jonathan[ENSCO INC]
Time: Thu Jan 19 12:17:32 2012

Hi John,

Thanks for examining this issue.  I appreciate the help!
Here is an update to what Tom Cram and I iterated on last week.

We
both discovered that the "duplicate" observations with WMO 5-digit
station identifiers were of a different input report type.
I
consulted further the documentation for PB2NC in the config file, and
noticed the following block:
//
// Specify a comma-separated list of
input report type values to retain.
// An empty list indicates that
all should be retained.
//
//
http://www.emc.ncep.noaa.gov/mmb/data_processing/prepbufr.doc/table_6.htm
//
// e.g. in_report_type[] = [ 11, 22, 23 ];
//
// DEFAULT (blank)
//in_report_type[] = [];
in_report_type[] = [ 512, 522, 531, 540, 562
];

I went to the web link above to pick the specific report types I
wanted to retain for surface observations (i.e. 512, 522, etc.).
Withholding code "511" for land surface stations eliminated the WMO 5-
digit station ID obs, so I believe I now have unique stations for
running point_stat.  If I encounter any additional problems, I'll send
out another message.

Thank you,
Jonathan

-----Original
Message-----
From: John Halley Gotway via RT
[mailto:met_help at ucar.edu] 
Sent: Thursday, January 19, 2012 1:01 PM
To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
Cc: tcram at ucar.edu
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
Jonathan,

I apologize for the long delay in getting back to you on
this.  We've been scrambling over the last couple of weeks to finish
up development on a new release.  Here's my recollection of what's
going on with this issue:

   - You're using the GDAS PrepBUFR
observation dataset, but you're finding that PB2NC retains very few
ADPSFC observations when you a quality marker of 2.
   - We advised
via MET-Help that the algorithm employed by NCEP in the GDAS
processing sets most ADPSFC observations' quality marker to a value of
9.  NCEP does that to prevent those observations from being used in
the data assimilation.  So the use of quality marker = 9 is more an
artifact of the data assimilation process and not really saying
anything about the quality of those observations.
   - When you
switch to using a quality marker = 9 in PB2NC, you got many matches,
but ended up with more "duplicate" observations.

So is using a
quality marker = 9 in PB2NC causing "duplicate" observations to be
retained?

I did some investigation on this issue this morning.
Here's what I did:

- Retrieved this file:
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/gdas1.t12z.prepbufr.nr
- Ran it through PB2NC from message type = ADPSFC, time window = +/- 0
seconds, and quality markers of 2 and 9.
- For both, I used the
updated version of the plot_point_obs tool to create a plot of the
data and dump header information about the points being plotted.
- I
also used the -dump option for PB2NC to dump all of the ADPSFC
observations to ASCII format.

I've attached several things to this
message:
- The postscript output of plot_point_obs for qm = 2 and qm
= 9m, after first converting to png format.
- The output from the
plot_point_obs tool for both runs.

For qm=2, there were 51
locations plotted in your domain.
   - Of those 51...
      - All 51
header entries are unique.
      - There are only 36 unique
combinations of lat/lon.
For qm=9, there were 101 locations plotted
in your domain.
   - Of those 101...
      - There are only 52
unique header entries.
      - There are only 37 unique combinations
of lat/lon.

I think there are two issues occurring here:

(1)
When using qm=2, you'll often see two observing locations that look
the same except for the station ID.  For example:
  [ ADPSFC, 78792,
20120112_120000, 9.05, -79.37, 11 ]
  [ ADPSFC, MPTO,
20120112_120000, 9.05, -79.37, 11 ]

I looked at the observations
that correspond to these and found that they do actually differ
slightly.

(2) The second, larger issue here is when using qm=9.  It
does appear that we're really getting duplicate observations.  Foe
example:
  [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
  [
ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]

This will likely
require further debugging of the PB2NC tool to figure out what's going
on.

I just wanted to let you know what I've found so far.
Thanks,
John Halley Gotway



On 01/13/2012 03:50 PM, Case,
Jonathan[ENSCO INC] via RT wrote:
>
> Fri Jan 13 15:50:08 2012:
Request 52626 was acted upon.
> Transaction: Ticket created by
jonathan.case-1 at nasa.gov
>         Queue: met_help
>       Subject:
RE: Help with ds337.0
>         Owner: Nobody
>    Requestors:
jonathan.case-1 at nasa.gov
>        Status: new
>   Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
>
> Hi
Tom/MET help,
>
> Thanks for the fantastically quick reply, Tom!
>
> It turns out that I'm specifically referring to the netcdf output
from the pb2nc program.
>
> I already sent a help ticket to the MET
team, asking if they have a means for removing the duplicate obs from
their PB2NC process.  At the time, they didn't refer to the history of
data as it undergoes QC, so this might help me track down the reason
for the duplicate obs.  So, I have CC'd the met_help to this email.
>
> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>
>> From what it sounds like, I need to better understand what the
"happy medium" should be in setting the quality_mark_thresh flag in
pb2nc.  2 is too restrictive, while 9 appears to be allowing duplicate
observations into the mix as a result of the QC process.
>
> Any
recommendations are greatly welcome!
>
> Thanks much,
> Jonathan
>
>
> From: Thomas Cram [mailto:tcram at ucar.edu]
> Sent: Friday,
January 13, 2012 4:39 PM
> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
> Subject: Re: Help with ds337.0
>
> Hi Jonathan,
>
> the only
experience I have working with the MET software is using the pb2nc
utility to convert PREPBUFR observations into a NetCDF dataset, so my
knowledge of MET is limited.  However, the one reason I can think of
for the duplicate observations is that you're seeing the same
observation after several stages of quality-control pre-processing.
The PREPBUFR files contain a complete history of the data as it's
modified during QC, so each station will have multiple reports at a
single time.  There's a quality control flag appended to each PREPBUFR
message; you want to keep the observation with the lowest QC number.
>
> Can you send me the date and time for the examples you list
below?  I'll take a look at the PREPBUFR messages and see if this is
the case.
>
> If this doesn't explain it, then I'll forward your
question on to MET support desk and see if they know the reason for
duplicate observations.  They are intimately familiar with the
PREPBUFR obs, so I'm sure they can help you out.
>
> - Tom
>
> On
Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO INC] wrote:
>
>
> Dear Thomas,
>
> This is Jonathan Case of the NASA SPoRT
Center (http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
> I
am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>
> Now here is the interesting
part:  When I examined the textual difference files generated by the
MET software, I noticed that there were several stations with
"duplicate" observations that led to duplicate forecast-observation
difference pairs.  I put duplicate in quotes because the observed
values were not necessarily the same but usually very close to one
another.
> The duplicate observations arose from the fact that at the
same observation location, there would be a 5-digit WMO identifier as
well as a 4-digit text station ID at a given hour.
> I stumbled on
these duplicate station data when I made a table of stations and
mapped them, revealing the duplicates.
>
> Some examples I stumbled
on include:
> *         78720/MHTG (both at 14.05N, -87.22E)
> *
78641/MGGT (both at 14.58N, -90.52E)
> *         78711/MHPL (both at
15.22N, -83.80E)
> *         78708/MHLM (both at 15.45N, -87.93)
>
> There are others, but I thought I'd provide a few examples to start.
>
> If the source of the duplicates is NCEP/EMC, I wonder if it would
be helpful to send them a note as well?
>
> Let me know how you
would like to proceed.
>
> Most sincerely,
> Jonathan
>
>
----------------------------------------------------------------------
> ----------------------------
> Jonathan Case, ENSCO Inc.
> NASA
Short-term Prediction Research and Transition Center (aka SPoRT 
>
Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
> Voice:
256.961.7504
> Fax: 256.961.7788
> Emails: Jonathan.Case-
1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>  / 
>
case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>
----------------------------------------------------------------------
> ----------------------------
>
> "Whether the weather is cold, or
whether the weather is hot, we'll weather
>    the weather whether we
like it or not!"
>
>
> Thomas Cram
> NCAR / CISL / DSS
> 303-497-
1217
> tcram at ucar.edu<mailto:tcram at ucar.edu>
>
>
>

------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: Case, Jonathan[ENSCO INC]
Time: Thu Jan 19 12:33:23 2012

John,

I noticed that even with specifying the input report types,
there are still a few duplicate observations in the final netcdf
dataset.
So, I'm seeing the same thing as in your analysis.
-Jonathan

-----Original Message-----
From: John Halley Gotway via
RT [mailto:met_help at ucar.edu] 
Sent: Thursday, January 19, 2012 1:01
PM
To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
Cc: tcram at ucar.edu
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
Jonathan,

I apologize for the long delay in getting back to you on
this.  We've been scrambling over the last couple of weeks to finish
up development on a new release.  Here's my recollection of what's
going on with this issue:

   - You're using the GDAS PrepBUFR
observation dataset, but you're finding that PB2NC retains very few
ADPSFC observations when you a quality marker of 2.
   - We advised
via MET-Help that the algorithm employed by NCEP in the GDAS
processing sets most ADPSFC observations' quality marker to a value of
9.  NCEP does that to prevent those observations from being used in
the data assimilation.  So the use of quality marker = 9 is more an
artifact of the data assimilation process and not really saying
anything about the quality of those observations.
   - When you
switch to using a quality marker = 9 in PB2NC, you got many matches,
but ended up with more "duplicate" observations.

So is using a
quality marker = 9 in PB2NC causing "duplicate" observations to be
retained?

I did some investigation on this issue this morning.
Here's what I did:

- Retrieved this file:
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/gdas1.t12z.prepbufr.nr
- Ran it through PB2NC from message type = ADPSFC, time window = +/- 0
seconds, and quality markers of 2 and 9.
- For both, I used the
updated version of the plot_point_obs tool to create a plot of the
data and dump header information about the points being plotted.
- I
also used the -dump option for PB2NC to dump all of the ADPSFC
observations to ASCII format.

I've attached several things to this
message:
- The postscript output of plot_point_obs for qm = 2 and qm
= 9m, after first converting to png format.
- The output from the
plot_point_obs tool for both runs.

For qm=2, there were 51
locations plotted in your domain.
   - Of those 51...
      - All 51
header entries are unique.
      - There are only 36 unique
combinations of lat/lon.
For qm=9, there were 101 locations plotted
in your domain.
   - Of those 101...
      - There are only 52
unique header entries.
      - There are only 37 unique combinations
of lat/lon.

I think there are two issues occurring here:

(1)
When using qm=2, you'll often see two observing locations that look
the same except for the station ID.  For example:
  [ ADPSFC, 78792,
20120112_120000, 9.05, -79.37, 11 ]
  [ ADPSFC, MPTO,
20120112_120000, 9.05, -79.37, 11 ]

I looked at the observations
that correspond to these and found that they do actually differ
slightly.

(2) The second, larger issue here is when using qm=9.  It
does appear that we're really getting duplicate observations.  Foe
example:
  [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
  [
ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]

This will likely
require further debugging of the PB2NC tool to figure out what's going
on.

I just wanted to let you know what I've found so far.
Thanks,
John Halley Gotway



On 01/13/2012 03:50 PM, Case,
Jonathan[ENSCO INC] via RT wrote:
>
> Fri Jan 13 15:50:08 2012:
Request 52626 was acted upon.
> Transaction: Ticket created by
jonathan.case-1 at nasa.gov
>         Queue: met_help
>       Subject:
RE: Help with ds337.0
>         Owner: Nobody
>    Requestors:
jonathan.case-1 at nasa.gov
>        Status: new
>   Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
>
> Hi
Tom/MET help,
>
> Thanks for the fantastically quick reply, Tom!
>
> It turns out that I'm specifically referring to the netcdf output
from the pb2nc program.
>
> I already sent a help ticket to the MET
team, asking if they have a means for removing the duplicate obs from
their PB2NC process.  At the time, they didn't refer to the history of
data as it undergoes QC, so this might help me track down the reason
for the duplicate obs.  So, I have CC'd the met_help to this email.
>
> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>
>> From what it sounds like, I need to better understand what the
"happy medium" should be in setting the quality_mark_thresh flag in
pb2nc.  2 is too restrictive, while 9 appears to be allowing duplicate
observations into the mix as a result of the QC process.
>
> Any
recommendations are greatly welcome!
>
> Thanks much,
> Jonathan
>
>
> From: Thomas Cram [mailto:tcram at ucar.edu]
> Sent: Friday,
January 13, 2012 4:39 PM
> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
> Subject: Re: Help with ds337.0
>
> Hi Jonathan,
>
> the only
experience I have working with the MET software is using the pb2nc
utility to convert PREPBUFR observations into a NetCDF dataset, so my
knowledge of MET is limited.  However, the one reason I can think of
for the duplicate observations is that you're seeing the same
observation after several stages of quality-control pre-processing.
The PREPBUFR files contain a complete history of the data as it's
modified during QC, so each station will have multiple reports at a
single time.  There's a quality control flag appended to each PREPBUFR
message; you want to keep the observation with the lowest QC number.
>
> Can you send me the date and time for the examples you list
below?  I'll take a look at the PREPBUFR messages and see if this is
the case.
>
> If this doesn't explain it, then I'll forward your
question on to MET support desk and see if they know the reason for
duplicate observations.  They are intimately familiar with the
PREPBUFR obs, so I'm sure they can help you out.
>
> - Tom
>
> On
Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO INC] wrote:
>
>
> Dear Thomas,
>
> This is Jonathan Case of the NASA SPoRT
Center (http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
> I
am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>
> Now here is the interesting
part:  When I examined the textual difference files generated by the
MET software, I noticed that there were several stations with
"duplicate" observations that led to duplicate forecast-observation
difference pairs.  I put duplicate in quotes because the observed
values were not necessarily the same but usually very close to one
another.
> The duplicate observations arose from the fact that at the
same observation location, there would be a 5-digit WMO identifier as
well as a 4-digit text station ID at a given hour.
> I stumbled on
these duplicate station data when I made a table of stations and
mapped them, revealing the duplicates.
>
> Some examples I stumbled
on include:
> *         78720/MHTG (both at 14.05N, -87.22E)
> *
78641/MGGT (both at 14.58N, -90.52E)
> *         78711/MHPL (both at
15.22N, -83.80E)
> *         78708/MHLM (both at 15.45N, -87.93)
>
> There are others, but I thought I'd provide a few examples to start.
>
> If the source of the duplicates is NCEP/EMC, I wonder if it would
be helpful to send them a note as well?
>
> Let me know how you
would like to proceed.
>
> Most sincerely,
> Jonathan
>
>
----------------------------------------------------------------------
> ----------------------------
> Jonathan Case, ENSCO Inc.
> NASA
Short-term Prediction Research and Transition Center (aka SPoRT 
>
Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
> Voice:
256.961.7504
> Fax: 256.961.7788
> Emails: Jonathan.Case-
1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>  / 
>
case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>
----------------------------------------------------------------------
> ----------------------------
>
> "Whether the weather is cold, or
whether the weather is hot, we'll weather
>    the weather whether we
like it or not!"
>
>
> Thomas Cram
> NCAR / CISL / DSS
> 303-497-
1217
> tcram at ucar.edu<mailto:tcram at ucar.edu>
>
>
>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: John Halley Gotway
Time: Thu Jan 19 12:40:23 2012

Jonathan,

OK, I reran my analysis using the setting you suggested:
    in_report_type[] = [ 512, 522, 531, 540, 562 ];

Here's what I see:

   - For qm=2, there are 29 locations, all with unique header
information, and all with unique lat/lons.
     It looks like the station id's are all alphabetical.  So the
"in_report_type" setting has filtered out the numeric station id's.

   - For qm=9, there are 57 locations - but only 29 of them have
unique header information!

So I'll need to look more closely at what PB2NC is doing here.  It
looks like setting qm=9 really is causing duplicate observations to be
retained.

When I get a chance, I run it through the debugger to investigate.

Thanks,
John


On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
> John,
>
> I noticed that even with specifying the input report types, there
are still a few duplicate observations in the final netcdf dataset.
> So, I'm seeing the same thing as in your analysis.
>
> -Jonathan
>
> -----Original Message-----
> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
> Sent: Thursday, January 19, 2012 1:01 PM
> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
> Cc: tcram at ucar.edu
> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>
> Jonathan,
>
> I apologize for the long delay in getting back to you on this.
We've been scrambling over the last couple of weeks to finish up
development on a new release.  Here's my recollection of what's going
on with this issue:
>
>     - You're using the GDAS PrepBUFR observation dataset, but you're
finding that PB2NC retains very few ADPSFC observations when you a
quality marker of 2.
>     - We advised via MET-Help that the algorithm employed by NCEP in
the GDAS processing sets most ADPSFC observations' quality marker to a
value of 9.  NCEP does that to prevent those observations from being
used in the data assimilation.  So the use of quality marker = 9 is
more an artifact of the data assimilation process and not really
saying anything about the quality of those observations.
>     - When you switch to using a quality marker = 9 in PB2NC, you
got many matches, but ended up with more "duplicate" observations.
>
> So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>
> I did some investigation on this issue this morning.  Here's what I
did:
>
> - Retrieved this file:
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/gdas1.t12z.prepbufr.nr
> - Ran it through PB2NC from message type = ADPSFC, time window = +/-
0 seconds, and quality markers of 2 and 9.
> - For both, I used the updated version of the plot_point_obs tool to
create a plot of the data and dump header information about the points
being plotted.
> - I also used the -dump option for PB2NC to dump all of the ADPSFC
observations to ASCII format.
>
> I've attached several things to this message:
> - The postscript output of plot_point_obs for qm = 2 and qm = 9m,
after first converting to png format.
> - The output from the plot_point_obs tool for both runs.
>
> For qm=2, there were 51 locations plotted in your domain.
>     - Of those 51...
>        - All 51 header entries are unique.
>        - There are only 36 unique combinations of lat/lon.
> For qm=9, there were 101 locations plotted in your domain.
>     - Of those 101...
>        - There are only 52 unique header entries.
>        - There are only 37 unique combinations of lat/lon.
>
> I think there are two issues occurring here:
>
> (1) When using qm=2, you'll often see two observing locations that
look the same except for the station ID.  For example:
>    [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>    [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>
> I looked at the observations that correspond to these and found that
they do actually differ slightly.
>
> (2) The second, larger issue here is when using qm=9.  It does
appear that we're really getting duplicate observations.  Foe example:
>    [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>    [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>
> This will likely require further debugging of the PB2NC tool to
figure out what's going on.
>
> I just wanted to let you know what I've found so far.
>
> Thanks,
> John Halley Gotway
>
>
>
> On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>
>> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>          Queue: met_help
>>        Subject: RE: Help with ds337.0
>>          Owner: Nobody
>>     Requestors: jonathan.case-1 at nasa.gov
>>         Status: new
>>    Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>>
>> Hi Tom/MET help,
>>
>> Thanks for the fantastically quick reply, Tom!
>>
>> It turns out that I'm specifically referring to the netcdf output
from the pb2nc program.
>>
>> I already sent a help ticket to the MET team, asking if they have a
means for removing the duplicate obs from their PB2NC process.  At the
time, they didn't refer to the history of data as it undergoes QC, so
this might help me track down the reason for the duplicate obs.  So, I
have CC'd the met_help to this email.
>>
>> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>>
>>>  From what it sounds like, I need to better understand what the
"happy medium" should be in setting the quality_mark_thresh flag in
pb2nc.  2 is too restrictive, while 9 appears to be allowing duplicate
observations into the mix as a result of the QC process.
>>
>> Any recommendations are greatly welcome!
>>
>> Thanks much,
>> Jonathan
>>
>>
>> From: Thomas Cram [mailto:tcram at ucar.edu]
>> Sent: Friday, January 13, 2012 4:39 PM
>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>> Subject: Re: Help with ds337.0
>>
>> Hi Jonathan,
>>
>> the only experience I have working with the MET software is using
the pb2nc utility to convert PREPBUFR observations into a NetCDF
dataset, so my knowledge of MET is limited.  However, the one reason I
can think of for the duplicate observations is that you're seeing the
same observation after several stages of quality-control pre-
processing.  The PREPBUFR files contain a complete history of the data
as it's modified during QC, so each station will have multiple reports
at a single time.  There's a quality control flag appended to each
PREPBUFR message; you want to keep the observation with the lowest QC
number.
>>
>> Can you send me the date and time for the examples you list below?
I'll take a look at the PREPBUFR messages and see if this is the case.
>>
>> If this doesn't explain it, then I'll forward your question on to
MET support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>>
>> - Tom
>>
>> On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO INC]
wrote:
>>
>>
>> Dear Thomas,
>>
>> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>> I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>
>> Now here is the interesting part:  When I examined the textual
difference files generated by the MET software, I noticed that there
were several stations with "duplicate" observations that led to
duplicate forecast-observation difference pairs.  I put duplicate in
quotes because the observed values were not necessarily the same but
usually very close to one another.
>> The duplicate observations arose from the fact that at the same
observation location, there would be a 5-digit WMO identifier as well
as a 4-digit text station ID at a given hour.
>> I stumbled on these duplicate station data when I made a table of
stations and mapped them, revealing the duplicates.
>>
>> Some examples I stumbled on include:
>> *         78720/MHTG (both at 14.05N, -87.22E)
>> *         78641/MGGT (both at 14.58N, -90.52E)
>> *         78711/MHPL (both at 15.22N, -83.80E)
>> *         78708/MHLM (both at 15.45N, -87.93)
>>
>> There are others, but I thought I'd provide a few examples to
start.
>>
>> If the source of the duplicates is NCEP/EMC, I wonder if it would
be helpful to send them a note as well?
>>
>> Let me know how you would like to proceed.
>>
>> Most sincerely,
>> Jonathan
>>
>>
----------------------------------------------------------------------
>> ----------------------------
>> Jonathan Case, ENSCO Inc.
>> NASA Short-term Prediction Research and Transition Center (aka
SPoRT
>> Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
>> Voice: 256.961.7504
>> Fax: 256.961.7788
>> Emails: Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>
/
>> case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>
----------------------------------------------------------------------
>> ----------------------------
>>
>> "Whether the weather is cold, or whether the weather is hot, we'll
weather
>>     the weather whether we like it or not!"
>>
>>
>> Thomas Cram
>> NCAR / CISL / DSS
>> 303-497-1217
>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>
>>
>>
>

------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: Case, Jonathan[ENSCO INC]
Time: Mon Feb 13 15:10:31 2012

Hello John/Tim/Methelp,

I finally got back into looking at this
issue with duplicate obs showing up in the PB2NC output, resulting in
duplicate fcst-obs pairs being processed by point_stat.

I found a
single obs site in Central America that is generating a problem on 1
Oct 2011 at 12z (stid "MSSS").
I processed ONLY this obs through
pb2nc to see what the result is in the netcdf file.

Here is what I
see: (from an ncdump of the netcdf file)
.
.
.
.
data:
obs_arr =
  0, 33, 942.5, -9999, 0,
  0, 34, 942.5, -9999, -1,
  0,
32, 942.5, -9999, 1,
  1, 51, 942.5, 619.9501, 0.016576,
  1, 11,
942.5, 619.9501, 297.15,
  1, 17, 942.5, 619.9501, 293.9545,
  1, 2,
942.5, 619.9501, 101349.7,
  2, 33, 942.5, -9999, 0,
  2, 34, 942.5,
-9999, -1,
  2, 32, 942.5, -9999, 1,
  3, 51, 942.5, 619.9501,
0.016576,
  3, 11, 942.5, 619.9501, 297.15,
  3, 17, 942.5,
619.9501, 293.9545,
  3, 2, 942.5, 619.9501, 101349.7 ;

 hdr_typ =
"ADPSFC",
  "ADPSFC",
  "ADPSFC",
  "ADPSFC" ;

 hdr_sid =
"MSSS",
  "MSSS",
  "MSSS",
  "MSSS" ;

 hdr_vld =
"20111001_115000",
  "20111001_115000",
  "20111001_115501",
"20111001_115501" ;

 hdr_arr =
  13.7, -89.12, 621,
  13.7,
-89.12, 621,
  13.7, -89.12, 621,
  13.7, -89.12, 621 ;
}

So,
from what I can tell, the station is reporting the same obs at 2
different times,
1150(00) UTC and 1155(01) UTC.  Do you have any
recommendation on how I can retain only one of these obs, preferably
the one closest to the top of the hour?  I know I could dramatically
narrow down the time window (e.g. +/- 5 min), but I suspect this would
likely miss out on most observations that report about 10 minutes
before the hour.

I value your feedback on this matter.
Sincerely,
Jonathan

-----Original Message-----
From: John Halley Gotway via
RT [mailto:met_help at ucar.edu] 
Sent: Thursday, January 19, 2012 1:40
PM
To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
Cc: tcram at ucar.edu
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
Jonathan,

OK, I reran my analysis using the setting you suggested:
in_report_type[] = [ 512, 522, 531, 540, 562 ];

Here's what I see:
- For qm=2, there are 29 locations, all with unique header
information, and all with unique lat/lons.
     It looks like the
station id's are all alphabetical.  So the "in_report_type" setting
has filtered out the numeric station id's.

   - For qm=9, there are
57 locations - but only 29 of them have unique header information!
So I'll need to look more closely at what PB2NC is doing here.  It
looks like setting qm=9 really is causing duplicate observations to be
retained.

When I get a chance, I run it through the debugger to
investigate.

Thanks,
John


On 01/19/2012 12:33 PM, Case,
Jonathan[ENSCO INC] via RT wrote:
>
> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
> John,
>
> I noticed that even with specifying the input report types, there
are still a few duplicate observations in the final netcdf dataset.
>
So, I'm seeing the same thing as in your analysis.
>
> -Jonathan
>
> -----Original Message-----
> From: John Halley Gotway via RT
[mailto:met_help at ucar.edu]
> Sent: Thursday, January 19, 2012 1:01 PM
> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
> Cc: tcram at ucar.edu
>
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>
>
Jonathan,
>
> I apologize for the long delay in getting back to you
on this.  We've been scrambling over the last couple of weeks to
finish up development on a new release.  Here's my recollection of
what's going on with this issue:
>
>     - You're using the GDAS
PrepBUFR observation dataset, but you're finding that PB2NC retains
very few ADPSFC observations when you a quality marker of 2.
>     -
We advised via MET-Help that the algorithm employed by NCEP in the
GDAS processing sets most ADPSFC observations' quality marker to a
value of 9.  NCEP does that to prevent those observations from being
used in the data assimilation.  So the use of quality marker = 9 is
more an artifact of the data assimilation process and not really
saying anything about the quality of those observations.
>     - When
you switch to using a quality marker = 9 in PB2NC, you got many
matches, but ended up with more "duplicate" observations.
>
> So is
using a quality marker = 9 in PB2NC causing "duplicate" observations
to be retained?
>
> I did some investigation on this issue this
morning.  Here's what I did:
>
> - Retrieved this file:
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/gdas1.t12z.prepbufr.nr
> - Ran it through PB2NC from message type = ADPSFC, time window = +/-
0 seconds, and quality markers of 2 and 9.
> - For both, I used the
updated version of the plot_point_obs tool to create a plot of the
data and dump header information about the points being plotted.
> -
I also used the -dump option for PB2NC to dump all of the ADPSFC
observations to ASCII format.
>
> I've attached several things to
this message:
> - The postscript output of plot_point_obs for qm = 2
and qm = 9m, after first converting to png format.
> - The output
from the plot_point_obs tool for both runs.
>
> For qm=2, there were
51 locations plotted in your domain.
>     - Of those 51...
>
- All 51 header entries are unique.
>        - There are only 36
unique combinations of lat/lon.
> For qm=9, there were 101 locations
plotted in your domain.
>     - Of those 101...
>        - There are
only 52 unique header entries.
>        - There are only 37 unique
combinations of lat/lon.
>
> I think there are two issues occurring
here:
>
> (1) When using qm=2, you'll often see two observing
locations that look the same except for the station ID.  For example:
>    [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>    [
ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>
> I looked at
the observations that correspond to these and found that they do
actually differ slightly.
>
> (2) The second, larger issue here is
when using qm=9.  It does appear that we're really getting duplicate
observations.  Foe example:
>    [ ADPSFC, 78792, 20120112_120000,
9.05, -79.37, 11 ]
>    [ ADPSFC, 78792, 20120112_120000, 9.05,
-79.37, 11 ]
>
> This will likely require further debugging of the
PB2NC tool to figure out what's going on.
>
> I just wanted to let
you know what I've found so far.
>
> Thanks,
> John Halley Gotway
>
>
>
> On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT
wrote:
>>
>> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>
Queue: met_help
>>        Subject: RE: Help with ds337.0
>>
Owner: Nobody
>>     Requestors: jonathan.case-1 at nasa.gov
>>
Status: new
>>    Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>>
>>
Hi Tom/MET help,
>>
>> Thanks for the fantastically quick reply,
Tom!
>>
>> It turns out that I'm specifically referring to the
netcdf output from the pb2nc program.
>>
>> I already sent a help
ticket to the MET team, asking if they have a means for removing the
duplicate obs from their PB2NC process.  At the time, they didn't
refer to the history of data as it undergoes QC, so this might help me
track down the reason for the duplicate obs.  So, I have CC'd the
met_help to this email.
>>
>> It turns out that when I initially ran
pb2nc with the default quality control flag set to "2" (i.e.
quality_mark_thresh in the PB2NCConfig_default file), I did not get
ANY surface observations in my final netcdf file over Central America.
Upon email exchanges with the MET team, it was recommended that I set
the quality control flag to "9" to be able to accept more observations
into the netcdf outfile.
>>
>>>  From what it sounds like, I need to
better understand what the "happy medium" should be in setting the
quality_mark_thresh flag in pb2nc.  2 is too restrictive, while 9
appears to be allowing duplicate observations into the mix as a result
of the QC process.
>>
>> Any recommendations are greatly welcome!
>>
>> Thanks much,
>> Jonathan
>>
>>
>> From: Thomas Cram
[mailto:tcram at ucar.edu]
>> Sent: Friday, January 13, 2012 4:39 PM
>>
To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>> Subject: Re: Help with
ds337.0
>>
>> Hi Jonathan,
>>
>> the only experience I have
working with the MET software is using the pb2nc utility to convert
PREPBUFR observations into a NetCDF dataset, so my knowledge of MET is
limited.  However, the one reason I can think of for the duplicate
observations is that you're seeing the same observation after several
stages of quality-control pre-processing.  The PREPBUFR files contain
a complete history of the data as it's modified during QC, so each
station will have multiple reports at a single time.  There's a
quality control flag appended to each PREPBUFR message; you want to
keep the observation with the lowest QC number.
>>
>> Can you send
me the date and time for the examples you list below?  I'll take a
look at the PREPBUFR messages and see if this is the case.
>>
>> If
this doesn't explain it, then I'll forward your question on to MET
support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>>
>> - Tom
>>
>> On Jan 13, 2012,
at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO INC] wrote:
>>
>>
>>
Dear Thomas,
>>
>> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>> I am
conducting some weather model verification using the MET verification
software (NCAR's Meteorological Evaluation Tools) and the NCEP GDAS
PREPBUFR point observation files for ground truth.  I have accessed
archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>
>> Now here is the interesting
part:  When I examined the textual difference files generated by the
MET software, I noticed that there were several stations with
"duplicate" observations that led to duplicate forecast-observation
difference pairs.  I put duplicate in quotes because the observed
values were not necessarily the same but usually very close to one
another.
>> The duplicate observations arose from the fact that at
the same observation location, there would be a 5-digit WMO identifier
as well as a 4-digit text station ID at a given hour.
>> I stumbled
on these duplicate station data when I made a table of stations and
mapped them, revealing the duplicates.
>>
>> Some examples I
stumbled on include:
>> *         78720/MHTG (both at 14.05N,
-87.22E)
>> *         78641/MGGT (both at 14.58N, -90.52E)
>> *
78711/MHPL (both at 15.22N, -83.80E)
>> *         78708/MHLM (both at
15.45N, -87.93)
>>
>> There are others, but I thought I'd provide a
few examples to start.
>>
>> If the source of the duplicates is
NCEP/EMC, I wonder if it would be helpful to send them a note as well?
>>
>> Let me know how you would like to proceed.
>>
>> Most
sincerely,
>> Jonathan
>>
>>
----------------------------------------------------------------------
>> ----------------------------
>> Jonathan Case, ENSCO Inc.
>> NASA
Short-term Prediction Research and Transition Center (aka SPoRT
>>
Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
>> Voice:
256.961.7504
>> Fax: 256.961.7788
>> Emails: Jonathan.Case-
1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>   /
>>
case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>
----------------------------------------------------------------------
>> ----------------------------
>>
>> "Whether the weather is cold,
or whether the weather is hot, we'll weather
>>     the weather
whether we like it or not!"
>>
>>
>> Thomas Cram
>> NCAR / CISL /
DSS
>> 303-497-1217
>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>
>>
>>
>

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: John Halley Gotway
Time: Tue Feb 14 13:12:36 2012

Jonathan,

To answer your question directly, currently the only way of limiting
this is tightening the time window - as you mention.

About a year ago we had an internal discussion about this issue - what
to do when multiple observations occur at the same location in the
time window.  Some options we discussed were to take the mean
of the observed values or, more simply, use only the one that's
closest in time to the forecast valid time.  However these ideas never
found their way into the development tree.  It certainly seems
like a configurable option to control the sifting of these
observations is in order.

We recently released METv3.1 and have begun working on development for
METv4.0, due out in the Spring/Summer time-frame.  I will actually be
out of the office very soon for a couple weeks of paternity
leave.  But I'm going to reassign this ticket to Paul Oldenburg.  He
and I discussed taking a look at a patch to apply the simplest form of
this logic - only using the observation that's closest in
time.  If he's able to get one together, he could pass it along to you
to try out on your data.  It'd be helpful to have you involved testing
out the patch to ensure that it works well on real data.
The goal would be to get this logic worked out for inclusion in the
METv4.0 release.

So I'll reassign this ticket, and Paul will be in touch when he's able
to pull together a patch for METv3.1.

Sound reasonable?

Thanks,
John

On 02/13/2012 03:10 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
> Hello John/Tim/Methelp,
>
> I finally got back into looking at this issue with duplicate obs
showing up in the PB2NC output, resulting in duplicate fcst-obs pairs
being processed by point_stat.
>
> I found a single obs site in Central America that is generating a
problem on 1 Oct 2011 at 12z (stid "MSSS").
> I processed ONLY this obs through pb2nc to see what the result is in
the netcdf file.
>
> Here is what I see: (from an ncdump of the netcdf file)
> .
> .
> .
> .
> data:
>
>   obs_arr =
>    0, 33, 942.5, -9999, 0,
>    0, 34, 942.5, -9999, -1,
>    0, 32, 942.5, -9999, 1,
>    1, 51, 942.5, 619.9501, 0.016576,
>    1, 11, 942.5, 619.9501, 297.15,
>    1, 17, 942.5, 619.9501, 293.9545,
>    1, 2, 942.5, 619.9501, 101349.7,
>    2, 33, 942.5, -9999, 0,
>    2, 34, 942.5, -9999, -1,
>    2, 32, 942.5, -9999, 1,
>    3, 51, 942.5, 619.9501, 0.016576,
>    3, 11, 942.5, 619.9501, 297.15,
>    3, 17, 942.5, 619.9501, 293.9545,
>    3, 2, 942.5, 619.9501, 101349.7 ;
>
>   hdr_typ =
>    "ADPSFC",
>    "ADPSFC",
>    "ADPSFC",
>    "ADPSFC" ;
>
>   hdr_sid =
>    "MSSS",
>    "MSSS",
>    "MSSS",
>    "MSSS" ;
>
>   hdr_vld =
>    "20111001_115000",
>    "20111001_115000",
>    "20111001_115501",
>    "20111001_115501" ;
>
>   hdr_arr =
>    13.7, -89.12, 621,
>    13.7, -89.12, 621,
>    13.7, -89.12, 621,
>    13.7, -89.12, 621 ;
> }
>
> So, from what I can tell, the station is reporting the same obs at 2
different times,
> 1150(00) UTC and 1155(01) UTC.  Do you have any recommendation on
how I can retain only one of these obs, preferably the one closest to
the top of the hour?  I know I could dramatically narrow down the time
window (e.g. +/- 5 min), but I suspect this would likely miss out on
most observations that report about 10 minutes before the hour.
>
> I value your feedback on this matter.
> Sincerely,
> Jonathan
>
> -----Original Message-----
> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
> Sent: Thursday, January 19, 2012 1:40 PM
> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
> Cc: tcram at ucar.edu
> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>
> Jonathan,
>
> OK, I reran my analysis using the setting you suggested:
>      in_report_type[] = [ 512, 522, 531, 540, 562 ];
>
> Here's what I see:
>
>     - For qm=2, there are 29 locations, all with unique header
information, and all with unique lat/lons.
>       It looks like the station id's are all alphabetical.  So the
"in_report_type" setting has filtered out the numeric station id's.
>
>     - For qm=9, there are 57 locations - but only 29 of them have
unique header information!
>
> So I'll need to look more closely at what PB2NC is doing here.  It
looks like setting qm=9 really is causing duplicate observations to be
retained.
>
> When I get a chance, I run it through the debugger to investigate.
>
> Thanks,
> John
>
>
> On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>> John,
>>
>> I noticed that even with specifying the input report types, there
are still a few duplicate observations in the final netcdf dataset.
>> So, I'm seeing the same thing as in your analysis.
>>
>> -Jonathan
>>
>> -----Original Message-----
>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>> Sent: Thursday, January 19, 2012 1:01 PM
>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>> Cc: tcram at ucar.edu
>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>
>> Jonathan,
>>
>> I apologize for the long delay in getting back to you on this.
We've been scrambling over the last couple of weeks to finish up
development on a new release.  Here's my recollection of what's going
on with this issue:
>>
>>      - You're using the GDAS PrepBUFR observation dataset, but
you're finding that PB2NC retains very few ADPSFC observations when
you a quality marker of 2.
>>      - We advised via MET-Help that the algorithm employed by NCEP
in the GDAS processing sets most ADPSFC observations' quality marker
to a value of 9.  NCEP does that to prevent those observations from
being used in the data assimilation.  So the use of quality marker = 9
is more an artifact of the data assimilation process and not really
saying anything about the quality of those observations.
>>      - When you switch to using a quality marker = 9 in PB2NC, you
got many matches, but ended up with more "duplicate" observations.
>>
>> So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>>
>> I did some investigation on this issue this morning.  Here's what I
did:
>>
>> - Retrieved this file:
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/gdas1.t12z.prepbufr.nr
>> - Ran it through PB2NC from message type = ADPSFC, time window =
+/- 0 seconds, and quality markers of 2 and 9.
>> - For both, I used the updated version of the plot_point_obs tool
to create a plot of the data and dump header information about the
points being plotted.
>> - I also used the -dump option for PB2NC to dump all of the ADPSFC
observations to ASCII format.
>>
>> I've attached several things to this message:
>> - The postscript output of plot_point_obs for qm = 2 and qm = 9m,
after first converting to png format.
>> - The output from the plot_point_obs tool for both runs.
>>
>> For qm=2, there were 51 locations plotted in your domain.
>>      - Of those 51...
>>         - All 51 header entries are unique.
>>         - There are only 36 unique combinations of lat/lon.
>> For qm=9, there were 101 locations plotted in your domain.
>>      - Of those 101...
>>         - There are only 52 unique header entries.
>>         - There are only 37 unique combinations of lat/lon.
>>
>> I think there are two issues occurring here:
>>
>> (1) When using qm=2, you'll often see two observing locations that
look the same except for the station ID.  For example:
>>     [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>     [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>>
>> I looked at the observations that correspond to these and found
that they do actually differ slightly.
>>
>> (2) The second, larger issue here is when using qm=9.  It does
appear that we're really getting duplicate observations.  Foe example:
>>     [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>     [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>
>> This will likely require further debugging of the PB2NC tool to
figure out what's going on.
>>
>> I just wanted to let you know what I've found so far.
>>
>> Thanks,
>> John Halley Gotway
>>
>>
>>
>> On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>
>>> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
>>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>>           Queue: met_help
>>>         Subject: RE: Help with ds337.0
>>>           Owner: Nobody
>>>      Requestors: jonathan.case-1 at nasa.gov
>>>          Status: new
>>>     Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>
>>>
>>> Hi Tom/MET help,
>>>
>>> Thanks for the fantastically quick reply, Tom!
>>>
>>> It turns out that I'm specifically referring to the netcdf output
from the pb2nc program.
>>>
>>> I already sent a help ticket to the MET team, asking if they have
a means for removing the duplicate obs from their PB2NC process.  At
the time, they didn't refer to the history of data as it undergoes QC,
so this might help me track down the reason for the duplicate obs.
So, I have CC'd the met_help to this email.
>>>
>>> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>>>
>>>>    From what it sounds like, I need to better understand what the
"happy medium" should be in setting the quality_mark_thresh flag in
pb2nc.  2 is too restrictive, while 9 appears to be allowing duplicate
observations into the mix as a result of the QC process.
>>>
>>> Any recommendations are greatly welcome!
>>>
>>> Thanks much,
>>> Jonathan
>>>
>>>
>>> From: Thomas Cram [mailto:tcram at ucar.edu]
>>> Sent: Friday, January 13, 2012 4:39 PM
>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>> Subject: Re: Help with ds337.0
>>>
>>> Hi Jonathan,
>>>
>>> the only experience I have working with the MET software is using
the pb2nc utility to convert PREPBUFR observations into a NetCDF
dataset, so my knowledge of MET is limited.  However, the one reason I
can think of for the duplicate observations is that you're seeing the
same observation after several stages of quality-control pre-
processing.  The PREPBUFR files contain a complete history of the data
as it's modified during QC, so each station will have multiple reports
at a single time.  There's a quality control flag appended to each
PREPBUFR message; you want to keep the observation with the lowest QC
number.
>>>
>>> Can you send me the date and time for the examples you list below?
I'll take a look at the PREPBUFR messages and see if this is the case.
>>>
>>> If this doesn't explain it, then I'll forward your question on to
MET support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>>>
>>> - Tom
>>>
>>> On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO INC]
wrote:
>>>
>>>
>>> Dear Thomas,
>>>
>>> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>>> I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>>
>>> Now here is the interesting part:  When I examined the textual
difference files generated by the MET software, I noticed that there
were several stations with "duplicate" observations that led to
duplicate forecast-observation difference pairs.  I put duplicate in
quotes because the observed values were not necessarily the same but
usually very close to one another.
>>> The duplicate observations arose from the fact that at the same
observation location, there would be a 5-digit WMO identifier as well
as a 4-digit text station ID at a given hour.
>>> I stumbled on these duplicate station data when I made a table of
stations and mapped them, revealing the duplicates.
>>>
>>> Some examples I stumbled on include:
>>> *         78720/MHTG (both at 14.05N, -87.22E)
>>> *         78641/MGGT (both at 14.58N, -90.52E)
>>> *         78711/MHPL (both at 15.22N, -83.80E)
>>> *         78708/MHLM (both at 15.45N, -87.93)
>>>
>>> There are others, but I thought I'd provide a few examples to
start.
>>>
>>> If the source of the duplicates is NCEP/EMC, I wonder if it would
be helpful to send them a note as well?
>>>
>>> Let me know how you would like to proceed.
>>>
>>> Most sincerely,
>>> Jonathan
>>>
>>>
----------------------------------------------------------------------
>>> ----------------------------
>>> Jonathan Case, ENSCO Inc.
>>> NASA Short-term Prediction Research and Transition Center (aka
SPoRT
>>> Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
>>> Voice: 256.961.7504
>>> Fax: 256.961.7788
>>> Emails: Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>
/
>>> case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>>
----------------------------------------------------------------------
>>> ----------------------------
>>>
>>> "Whether the weather is cold, or whether the weather is hot, we'll
weather
>>>      the weather whether we like it or not!"
>>>
>>>
>>> Thomas Cram
>>> NCAR / CISL / DSS
>>> 303-497-1217
>>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>>
>>>
>>>
>>
>

------------------------------------------------
Subject: RE: Help with ds337.0
From: Paul Oldenburg
Time: Thu Mar 01 13:50:17 2012

Jonathan,

I developed a patch for handling point observations in MET that will
allow the user to optionally throw out duplicate
observations.  The process that I implemented does not use the
observation with the timestamp closest to the forecast
valid time as you suggested, because of complications in how the code
handles obs.  Then, when I thought about this, it
occurred to me that it doesn't matter because the observation is a
duplicate anyway.  (right?)

A duplicate observation is defined as an observation with identical
message type, station id, grib code (forecast
field), latitude, longitude, level, height and observed value to an
observation that has already been processed.

Please deploy the latest MET patches and then the attached patch:

1. Deploy latest METv3.1 patches from
http://www.dtcenter.org/met/users/support/known_issues/METv3.1/index.php
2. Save attached tarball to base MET directory
3. Untar it, which should overwrite four source files
4. Run 'make clean' and then 'make'

When this is complete, you should notice a new command-line option for
point_stat: -unique.  When you use this,
point_stat should detect and throw out duplicate observations.  If
your verbosity level is set to 3 or higher, it will
report which observations are being thrown out.  Please test this and
let me know if you have any trouble or if it does
not work in the way that you expected.

Thanks,

Paul


On 02/13/2012 03:10 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
> Hello John/Tim/Methelp,
>
> I finally got back into looking at this issue with duplicate obs
showing up in the PB2NC output, resulting in duplicate fcst-obs pairs
being processed by point_stat.
>
> I found a single obs site in Central America that is generating a
problem on 1 Oct 2011 at 12z (stid "MSSS").
> I processed ONLY this obs through pb2nc to see what the result is in
the netcdf file.
>
> Here is what I see: (from an ncdump of the netcdf file)
> .
> .
> .
> .
> data:
>
>   obs_arr =
>    0, 33, 942.5, -9999, 0,
>    0, 34, 942.5, -9999, -1,
>    0, 32, 942.5, -9999, 1,
>    1, 51, 942.5, 619.9501, 0.016576,
>    1, 11, 942.5, 619.9501, 297.15,
>    1, 17, 942.5, 619.9501, 293.9545,
>    1, 2, 942.5, 619.9501, 101349.7,
>    2, 33, 942.5, -9999, 0,
>    2, 34, 942.5, -9999, -1,
>    2, 32, 942.5, -9999, 1,
>    3, 51, 942.5, 619.9501, 0.016576,
>    3, 11, 942.5, 619.9501, 297.15,
>    3, 17, 942.5, 619.9501, 293.9545,
>    3, 2, 942.5, 619.9501, 101349.7 ;
>
>   hdr_typ =
>    "ADPSFC",
>    "ADPSFC",
>    "ADPSFC",
>    "ADPSFC" ;
>
>   hdr_sid =
>    "MSSS",
>    "MSSS",
>    "MSSS",
>    "MSSS" ;
>
>   hdr_vld =
>    "20111001_115000",
>    "20111001_115000",
>    "20111001_115501",
>    "20111001_115501" ;
>
>   hdr_arr =
>    13.7, -89.12, 621,
>    13.7, -89.12, 621,
>    13.7, -89.12, 621,
>    13.7, -89.12, 621 ;
> }
>
> So, from what I can tell, the station is reporting the same obs at 2
different times,
> 1150(00) UTC and 1155(01) UTC.  Do you have any recommendation on
how I can retain only one of these obs, preferably the one closest to
the top of the hour?  I know I could dramatically narrow down the time
window (e.g. +/- 5 min), but I suspect this would likely miss out on
most observations that report about 10 minutes before the hour.
>
> I value your feedback on this matter.
> Sincerely,
> Jonathan
>
> -----Original Message-----
> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
> Sent: Thursday, January 19, 2012 1:40 PM
> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
> Cc: tcram at ucar.edu
> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>
> Jonathan,
>
> OK, I reran my analysis using the setting you suggested:
>      in_report_type[] = [ 512, 522, 531, 540, 562 ];
>
> Here's what I see:
>
>     - For qm=2, there are 29 locations, all with unique header
information, and all with unique lat/lons.
>       It looks like the station id's are all alphabetical.  So the
"in_report_type" setting has filtered out the numeric station id's.
>
>     - For qm=9, there are 57 locations - but only 29 of them have
unique header information!
>
> So I'll need to look more closely at what PB2NC is doing here.  It
looks like setting qm=9 really is causing duplicate observations to be
retained.
>
> When I get a chance, I run it through the debugger to investigate.
>
> Thanks,
> John
>
>
> On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>> John,
>>
>> I noticed that even with specifying the input report types, there
are still a few duplicate observations in the final netcdf dataset.
>> So, I'm seeing the same thing as in your analysis.
>>
>> -Jonathan
>>
>> -----Original Message-----
>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>> Sent: Thursday, January 19, 2012 1:01 PM
>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>> Cc: tcram at ucar.edu
>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>
>> Jonathan,
>>
>> I apologize for the long delay in getting back to you on this.
We've been scrambling over the last couple of weeks to finish up
development on a new release.  Here's my recollection of what's going
on with this issue:
>>
>>      - You're using the GDAS PrepBUFR observation dataset, but
you're finding that PB2NC retains very few ADPSFC observations when
you a quality marker of 2.
>>      - We advised via MET-Help that the algorithm employed by NCEP
in the GDAS processing sets most ADPSFC observations' quality marker
to a value of 9.  NCEP does that to prevent those observations from
being used in the data assimilation.  So the use of quality marker = 9
is more an artifact of the data assimilation process and not really
saying anything about the quality of those observations.
>>      - When you switch to using a quality marker = 9 in PB2NC, you
got many matches, but ended up with more "duplicate" observations.
>>
>> So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>>
>> I did some investigation on this issue this morning.  Here's what I
did:
>>
>> - Retrieved this file:
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/gdas1.t12z.prepbufr.nr
>> - Ran it through PB2NC from message type = ADPSFC, time window =
+/- 0 seconds, and quality markers of 2 and 9.
>> - For both, I used the updated version of the plot_point_obs tool
to create a plot of the data and dump header information about the
points being plotted.
>> - I also used the -dump option for PB2NC to dump all of the ADPSFC
observations to ASCII format.
>>
>> I've attached several things to this message:
>> - The postscript output of plot_point_obs for qm = 2 and qm = 9m,
after first converting to png format.
>> - The output from the plot_point_obs tool for both runs.
>>
>> For qm=2, there were 51 locations plotted in your domain.
>>      - Of those 51...
>>         - All 51 header entries are unique.
>>         - There are only 36 unique combinations of lat/lon.
>> For qm=9, there were 101 locations plotted in your domain.
>>      - Of those 101...
>>         - There are only 52 unique header entries.
>>         - There are only 37 unique combinations of lat/lon.
>>
>> I think there are two issues occurring here:
>>
>> (1) When using qm=2, you'll often see two observing locations that
look the same except for the station ID.  For example:
>>     [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>     [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>>
>> I looked at the observations that correspond to these and found
that they do actually differ slightly.
>>
>> (2) The second, larger issue here is when using qm=9.  It does
appear that we're really getting duplicate observations.  Foe example:
>>     [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>     [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>
>> This will likely require further debugging of the PB2NC tool to
figure out what's going on.
>>
>> I just wanted to let you know what I've found so far.
>>
>> Thanks,
>> John Halley Gotway
>>
>>
>>
>> On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>
>>> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
>>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>>           Queue: met_help
>>>         Subject: RE: Help with ds337.0
>>>           Owner: Nobody
>>>      Requestors: jonathan.case-1 at nasa.gov
>>>          Status: new
>>>     Ticket<URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>
>>>
>>> Hi Tom/MET help,
>>>
>>> Thanks for the fantastically quick reply, Tom!
>>>
>>> It turns out that I'm specifically referring to the netcdf output
from the pb2nc program.
>>>
>>> I already sent a help ticket to the MET team, asking if they have
a means for removing the duplicate obs from their PB2NC process.  At
the time, they didn't refer to the history of data as it undergoes QC,
so this might help me track down the reason for the duplicate obs.
So, I have CC'd the met_help to this email.
>>>
>>> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>>>
>>>>    From what it sounds like, I need to better understand what the
"happy medium" should be in setting the quality_mark_thresh flag in
pb2nc.  2 is too restrictive, while 9 appears to be allowing duplicate
observations into the mix as a result of the QC process.
>>>
>>> Any recommendations are greatly welcome!
>>>
>>> Thanks much,
>>> Jonathan
>>>
>>>
>>> From: Thomas Cram [mailto:tcram at ucar.edu]
>>> Sent: Friday, January 13, 2012 4:39 PM
>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>> Subject: Re: Help with ds337.0
>>>
>>> Hi Jonathan,
>>>
>>> the only experience I have working with the MET software is using
the pb2nc utility to convert PREPBUFR observations into a NetCDF
dataset, so my knowledge of MET is limited.  However, the one reason I
can think of for the duplicate observations is that you're seeing the
same observation after several stages of quality-control pre-
processing.  The PREPBUFR files contain a complete history of the data
as it's modified during QC, so each station will have multiple reports
at a single time.  There's a quality control flag appended to each
PREPBUFR message; you want to keep the observation with the lowest QC
number.
>>>
>>> Can you send me the date and time for the examples you list below?
I'll take a look at the PREPBUFR messages and see if this is the case.
>>>
>>> If this doesn't explain it, then I'll forward your question on to
MET support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>>>
>>> - Tom
>>>
>>> On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO INC]
wrote:
>>>
>>>
>>> Dear Thomas,
>>>
>>> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>>> I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>>
>>> Now here is the interesting part:  When I examined the textual
difference files generated by the MET software, I noticed that there
were several stations with "duplicate" observations that led to
duplicate forecast-observation difference pairs.  I put duplicate in
quotes because the observed values were not necessarily the same but
usually very close to one another.
>>> The duplicate observations arose from the fact that at the same
observation location, there would be a 5-digit WMO identifier as well
as a 4-digit text station ID at a given hour.
>>> I stumbled on these duplicate station data when I made a table of
stations and mapped them, revealing the duplicates.
>>>
>>> Some examples I stumbled on include:
>>> *         78720/MHTG (both at 14.05N, -87.22E)
>>> *         78641/MGGT (both at 14.58N, -90.52E)
>>> *         78711/MHPL (both at 15.22N, -83.80E)
>>> *         78708/MHLM (both at 15.45N, -87.93)
>>>
>>> There are others, but I thought I'd provide a few examples to
start.
>>>
>>> If the source of the duplicates is NCEP/EMC, I wonder if it would
be helpful to send them a note as well?
>>>
>>> Let me know how you would like to proceed.
>>>
>>> Most sincerely,
>>> Jonathan
>>>
>>>
----------------------------------------------------------------------
>>> ----------------------------
>>> Jonathan Case, ENSCO Inc.
>>> NASA Short-term Prediction Research and Transition Center (aka
SPoRT
>>> Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
>>> Voice: 256.961.7504
>>> Fax: 256.961.7788
>>> Emails: Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>
/
>>> case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>>
----------------------------------------------------------------------
>>> ----------------------------
>>>
>>> "Whether the weather is cold, or whether the weather is hot, we'll
weather
>>>      the weather whether we like it or not!"
>>>
>>>
>>> Thomas Cram
>>> NCAR / CISL / DSS
>>> 303-497-1217
>>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>>
>>>
>>>
>>
>
>


------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: Case, Jonathan[ENSCO INC]
Time: Thu Mar 01 13:55:52 2012

Paul,

Thanks for the patch.  I believe that I'm running MET v3.0,
not 3.1.  Hopefully the source files in your patch might work with our
older version.  If not, then I'll need to fully upgrade my config
files and such before I can test out the patch, so you may not hear
from me right away.

One follow-on question I have is whether the
patch will reject ALL duplicate obs since I've seen 3 duplicate obs in
some instances within the 20-minute time window used in point_stat?
Thanks again,
Jonathan

-----Original Message-----
From: Paul
Oldenburg via RT [mailto:met_help at ucar.edu]
Sent: Thursday, March 01,
2012 2:50 PM
To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
Cc:
tcram at ucar.edu
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with
ds337.0

Jonathan,

I developed a patch for handling point
observations in MET that will allow the user to optionally throw out
duplicate observations.  The process that I implemented does not use
the observation with the timestamp closest to the forecast valid time
as you suggested, because of complications in how the code handles
obs.  Then, when I thought about this, it occurred to me that it
doesn't matter because the observation is a duplicate anyway.
(right?)

A duplicate observation is defined as an observation with
identical message type, station id, grib code (forecast field),
latitude, longitude, level, height and observed value to an
observation that has already been processed.

Please deploy the
latest MET patches and then the attached patch:

1. Deploy latest
METv3.1 patches from
http://www.dtcenter.org/met/users/support/known_issues/METv3.1/index.php
2. Save attached tarball to base MET directory 3. Untar it, which
should overwrite four source files 4. Run 'make clean' and then 'make'
When this is complete, you should notice a new command-line option for
point_stat: -unique.  When you use this, point_stat should detect and
throw out duplicate observations.  If your verbosity level is set to 3
or higher, it will report which observations are being thrown out.
Please test this and let me know if you have any trouble or if it does
not work in the way that you expected.

Thanks,

Paul


On
02/13/2012 03:10 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>
>
<URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
>
Hello John/Tim/Methelp,
>
> I finally got back into looking at this
issue with duplicate obs showing up in the PB2NC output, resulting in
duplicate fcst-obs pairs being processed by point_stat.
>
> I found
a single obs site in Central America that is generating a problem on 1
Oct 2011 at 12z (stid "MSSS").
> I processed ONLY this obs through
pb2nc to see what the result is in the netcdf file.
>
> Here is what
I see: (from an ncdump of the netcdf file) .
> .
> .
> .
> data:
>
>   obs_arr =
>    0, 33, 942.5, -9999, 0,
>    0, 34, 942.5,
-9999, -1,
>    0, 32, 942.5, -9999, 1,
>    1, 51, 942.5, 619.9501,
0.016576,
>    1, 11, 942.5, 619.9501, 297.15,
>    1, 17, 942.5,
619.9501, 293.9545,
>    1, 2, 942.5, 619.9501, 101349.7,
>    2,
33, 942.5, -9999, 0,
>    2, 34, 942.5, -9999, -1,
>    2, 32,
942.5, -9999, 1,
>    3, 51, 942.5, 619.9501, 0.016576,
>    3, 11,
942.5, 619.9501, 297.15,
>    3, 17, 942.5, 619.9501, 293.9545,
>
3, 2, 942.5, 619.9501, 101349.7 ;
>
>   hdr_typ =
>    "ADPSFC",
>
"ADPSFC",
>    "ADPSFC",
>    "ADPSFC" ;
>
>   hdr_sid =
>
"MSSS",
>    "MSSS",
>    "MSSS",
>    "MSSS" ;
>
>   hdr_vld =
>    "20111001_115000",
>    "20111001_115000",
>
"20111001_115501",
>    "20111001_115501" ;
>
>   hdr_arr =
>
13.7, -89.12, 621,
>    13.7, -89.12, 621,
>    13.7, -89.12, 621,
>    13.7, -89.12, 621 ;
> }
>
> So, from what I can tell, the
station is reporting the same obs at 2
> different times,
> 1150(00)
UTC and 1155(01) UTC.  Do you have any recommendation on how I can
retain only one of these obs, preferably the one closest to the top of
the hour?  I know I could dramatically narrow down the time window
(e.g. +/- 5 min), but I suspect this would likely miss out on most
observations that report about 10 minutes before the hour.
>
> I
value your feedback on this matter.
> Sincerely,
> Jonathan
>
>
-----Original Message-----
> From: John Halley Gotway via RT
[mailto:met_help at ucar.edu]
> Sent: Thursday, January 19, 2012 1:40 PM
> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
> Cc: tcram at ucar.edu
>
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>
>
Jonathan,
>
> OK, I reran my analysis using the setting you
suggested:
>      in_report_type[] = [ 512, 522, 531, 540, 562 ];
>
> Here's what I see:
>
>     - For qm=2, there are 29 locations, all
with unique header information, and all with unique lat/lons.
>
It looks like the station id's are all alphabetical.  So the
"in_report_type" setting has filtered out the numeric station id's.
>
>     - For qm=9, there are 57 locations - but only 29 of them have
unique header information!
>
> So I'll need to look more closely at
what PB2NC is doing here.  It looks like setting qm=9 really is
causing duplicate observations to be retained.
>
> When I get a
chance, I run it through the debugger to investigate.
>
> Thanks,
>
John
>
>
> On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT
wrote:
>>
>> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>> John,
>>
>> I noticed that even with specifying the input report types,
there are still a few duplicate observations in the final netcdf
dataset.
>> So, I'm seeing the same thing as in your analysis.
>>
>> -Jonathan
>>
>> -----Original Message-----
>> From: John Halley
Gotway via RT [mailto:met_help at ucar.edu]
>> Sent: Thursday, January
19, 2012 1:01 PM
>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>> Cc:
tcram at ucar.edu
>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with
ds337.0
>>
>> Jonathan,
>>
>> I apologize for the long delay in
getting back to you on this.  We've been scrambling over the last
couple of weeks to finish up development on a new release.  Here's my
recollection of what's going on with this issue:
>>
>>      - You're
using the GDAS PrepBUFR observation dataset, but you're finding that
PB2NC retains very few ADPSFC observations when you a quality marker
of 2.
>>      - We advised via MET-Help that the algorithm employed
by NCEP in the GDAS processing sets most ADPSFC observations' quality
marker to a value of 9.  NCEP does that to prevent those observations
from being used in the data assimilation.  So the use of quality
marker = 9 is more an artifact of the data assimilation process and
not really saying anything about the quality of those observations.
>>      - When you switch to using a quality marker = 9 in PB2NC, you
got many matches, but ended up with more "duplicate" observations.
>>
>> So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>>
>> I did some investigation on this
issue this morning.  Here's what I did:
>>
>> - Retrieved this file:
>>
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/
>> gdas1.t12z.prepbufr.nr
>> - Ran it through PB2NC from message type
= ADPSFC, time window = +/- 0 seconds, and quality markers of 2 and 9.
>> - For both, I used the updated version of the plot_point_obs tool
to create a plot of the data and dump header information about the
points being plotted.
>> - I also used the -dump option for PB2NC to
dump all of the ADPSFC observations to ASCII format.
>>
>> I've
attached several things to this message:
>> - The postscript output
of plot_point_obs for qm = 2 and qm = 9m, after first converting to
png format.
>> - The output from the plot_point_obs tool for both
runs.
>>
>> For qm=2, there were 51 locations plotted in your
domain.
>>      - Of those 51...
>>         - All 51 header entries
are unique.
>>         - There are only 36 unique combinations of
lat/lon.
>> For qm=9, there were 101 locations plotted in your
domain.
>>      - Of those 101...
>>         - There are only 52
unique header entries.
>>         - There are only 37 unique
combinations of lat/lon.
>>
>> I think there are two issues
occurring here:
>>
>> (1) When using qm=2, you'll often see two
observing locations that look the same except for the station ID.  For
example:
>>     [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>     [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>>
>> I
looked at the observations that correspond to these and found that
they do actually differ slightly.
>>
>> (2) The second, larger issue
here is when using qm=9.  It does appear that we're really getting
duplicate observations.  Foe example:
>>     [ ADPSFC, 78792,
20120112_120000, 9.05, -79.37, 11 ]
>>     [ ADPSFC, 78792,
20120112_120000, 9.05, -79.37, 11 ]
>>
>> This will likely require
further debugging of the PB2NC tool to figure out what's going on.
>>
>> I just wanted to let you know what I've found so far.
>>
>>
Thanks,
>> John Halley Gotway
>>
>>
>>
>> On 01/13/2012 03:50 PM,
Case, Jonathan[ENSCO INC] via RT wrote:
>>>
>>> Fri Jan 13 15:50:08
2012: Request 52626 was acted upon.
>>> Transaction: Ticket created
by jonathan.case-1 at nasa.gov
>>>           Queue: met_help
>>>
Subject: RE: Help with ds337.0
>>>           Owner: Nobody
>>>
Requestors: jonathan.case-1 at nasa.gov
>>>          Status: new
>>>
Ticket<URL:
>>>
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>
>>>
>>> Hi Tom/MET help,
>>>
>>> Thanks for the fantastically quick
reply, Tom!
>>>
>>> It turns out that I'm specifically referring to
the netcdf output from the pb2nc program.
>>>
>>> I already sent a
help ticket to the MET team, asking if they have a means for removing
the duplicate obs from their PB2NC process.  At the time, they didn't
refer to the history of data as it undergoes QC, so this might help me
track down the reason for the duplicate obs.  So, I have CC'd the
met_help to this email.
>>>
>>> It turns out that when I initially
ran pb2nc with the default quality control flag set to "2" (i.e.
quality_mark_thresh in the PB2NCConfig_default file), I did not get
ANY surface observations in my final netcdf file over Central America.
Upon email exchanges with the MET team, it was recommended that I set
the quality control flag to "9" to be able to accept more observations
into the netcdf outfile.
>>>
>>>>    From what it sounds like, I
need to better understand what the "happy medium" should be in setting
the quality_mark_thresh flag in pb2nc.  2 is too restrictive, while 9
appears to be allowing duplicate observations into the mix as a result
of the QC process.
>>>
>>> Any recommendations are greatly welcome!
>>>
>>> Thanks much,
>>> Jonathan
>>>
>>>
>>> From: Thomas Cram
[mailto:tcram at ucar.edu]
>>> Sent: Friday, January 13, 2012 4:39 PM
>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>> Subject: Re: Help
with ds337.0
>>>
>>> Hi Jonathan,
>>>
>>> the only experience I
have working with the MET software is using the pb2nc utility to
convert PREPBUFR observations into a NetCDF dataset, so my knowledge
of MET is limited.  However, the one reason I can think of for the
duplicate observations is that you're seeing the same observation
after several stages of quality-control pre-processing.  The PREPBUFR
files contain a complete history of the data as it's modified during
QC, so each station will have multiple reports at a single time.
There's a quality control flag appended to each PREPBUFR message; you
want to keep the observation with the lowest QC number.
>>>
>>> Can
you send me the date and time for the examples you list below?  I'll
take a look at the PREPBUFR messages and see if this is the case.
>>>
>>> If this doesn't explain it, then I'll forward your question on to
MET support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>>>
>>> - Tom
>>>
>>> On Jan 13,
2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO INC] wrote:
>>>
>>>
>>> Dear Thomas,
>>>
>>> This is Jonathan Case of the NASA
SPoRT Center (http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>>> I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>>
>>> Now here is the
interesting part:  When I examined the textual difference files
generated by the MET software, I noticed that there were several
stations with "duplicate" observations that led to duplicate forecast-
observation difference pairs.  I put duplicate in quotes because the
observed values were not necessarily the same but usually very close
to one another.
>>> The duplicate observations arose from the fact
that at the same observation location, there would be a 5-digit WMO
identifier as well as a 4-digit text station ID at a given hour.
>>>
I stumbled on these duplicate station data when I made a table of
stations and mapped them, revealing the duplicates.
>>>
>>> Some
examples I stumbled on include:
>>> *         78720/MHTG (both at
14.05N, -87.22E)
>>> *         78641/MGGT (both at 14.58N, -90.52E)
>>> *         78711/MHPL (both at 15.22N, -83.80E)
>>> *
78708/MHLM (both at 15.45N, -87.93)
>>>
>>> There are others, but I
thought I'd provide a few examples to start.
>>>
>>> If the source
of the duplicates is NCEP/EMC, I wonder if it would be helpful to send
them a note as well?
>>>
>>> Let me know how you would like to
proceed.
>>>
>>> Most sincerely,
>>> Jonathan
>>>
>>>
--------------------------------------------------------------------
>>> --
>>> ----------------------------
>>> Jonathan Case, ENSCO
Inc.
>>> NASA Short-term Prediction Research and Transition Center
(aka SPoRT
>>> Center) 320 Sparkman Drive, Room 3062 Huntsville, AL
35805
>>> Voice: 256.961.7504
>>> Fax: 256.961.7788
>>> Emails:
Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>    /
>>>
case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>>
--------------------------------------------------------------------
>>> --
>>> ----------------------------
>>>
>>> "Whether the
weather is cold, or whether the weather is hot, we'll weather
>>>
the weather whether we like it or not!"
>>>
>>>
>>> Thomas Cram
>>> NCAR / CISL / DSS
>>> 303-497-1217
>>>
tcram at ucar.edu<mailto:tcram at ucar.edu>
>>>
>>>
>>>
>>
>
>

------------------------------------------------
Subject: RE: Help with ds337.0
From: Paul Oldenburg
Time: Thu Mar 01 14:03:23 2012

Jonathan,

Sorry, I should have noticed you were using METv3.0.  The patch will
only work for METv3.1.  You should strongly
consider upgrading to METv3.1 for reasons other than this patch,
including a major change in how gridded data is handled
internally by MET.  If you have more questions about this change, I'll
refer you to John.

Regarding the handling of duplicate observations, the patched code
will discard all observations that qualify as
duplicates, keeping only a single one.  I attached two point_stat
output files that include matched pairs showing the
effect of the unique flag in the patched code.  I hope this answers
your question.

Paul


On 03/01/2012 01:55 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
> Paul,
>
> Thanks for the patch.  I believe that I'm running MET v3.0, not 3.1.
Hopefully the source files in your patch might work with our older
version.  If not, then I'll need to fully upgrade my config files and
such before I can test out the patch, so you may not hear from me
right away.
>
> One follow-on question I have is whether the patch will reject ALL
duplicate obs since I've seen 3 duplicate obs in some instances within
the 20-minute time window used in point_stat?
>
> Thanks again,
> Jonathan
>
> -----Original Message-----
> From: Paul Oldenburg via RT [mailto:met_help at ucar.edu]
> Sent: Thursday, March 01, 2012 2:50 PM
> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
> Cc: tcram at ucar.edu
> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>
> Jonathan,
>
> I developed a patch for handling point observations in MET that will
allow the user to optionally throw out duplicate observations.  The
process that I implemented does not use the observation with the
timestamp closest to the forecast valid time as you suggested, because
of complications in how the code handles obs.  Then, when I thought
about this, it occurred to me that it doesn't matter because the
observation is a duplicate anyway.  (right?)
>
> A duplicate observation is defined as an observation with identical
message type, station id, grib code (forecast field), latitude,
longitude, level, height and observed value to an observation that has
already been processed.
>
> Please deploy the latest MET patches and then the attached patch:
>
> 1. Deploy latest METv3.1 patches from
http://www.dtcenter.org/met/users/support/known_issues/METv3.1/index.php
> 2. Save attached tarball to base MET directory 3. Untar it, which
should overwrite four source files 4. Run 'make clean' and then 'make'
>
> When this is complete, you should notice a new command-line option
for point_stat: -unique.  When you use this, point_stat should detect
and throw out duplicate observations.  If your verbosity level is set
to 3 or higher, it will report which observations are being thrown
out.  Please test this and let me know if you have any trouble or if
it does not work in the way that you expected.
>
> Thanks,
>
> Paul
>
>
> On 02/13/2012 03:10 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>> Hello John/Tim/Methelp,
>>
>> I finally got back into looking at this issue with duplicate obs
showing up in the PB2NC output, resulting in duplicate fcst-obs pairs
being processed by point_stat.
>>
>> I found a single obs site in Central America that is generating a
problem on 1 Oct 2011 at 12z (stid "MSSS").
>> I processed ONLY this obs through pb2nc to see what the result is
in the netcdf file.
>>
>> Here is what I see: (from an ncdump of the netcdf file) .
>> .
>> .
>> .
>> data:
>>
>>    obs_arr =
>>     0, 33, 942.5, -9999, 0,
>>     0, 34, 942.5, -9999, -1,
>>     0, 32, 942.5, -9999, 1,
>>     1, 51, 942.5, 619.9501, 0.016576,
>>     1, 11, 942.5, 619.9501, 297.15,
>>     1, 17, 942.5, 619.9501, 293.9545,
>>     1, 2, 942.5, 619.9501, 101349.7,
>>     2, 33, 942.5, -9999, 0,
>>     2, 34, 942.5, -9999, -1,
>>     2, 32, 942.5, -9999, 1,
>>     3, 51, 942.5, 619.9501, 0.016576,
>>     3, 11, 942.5, 619.9501, 297.15,
>>     3, 17, 942.5, 619.9501, 293.9545,
>>     3, 2, 942.5, 619.9501, 101349.7 ;
>>
>>    hdr_typ =
>>     "ADPSFC",
>>     "ADPSFC",
>>     "ADPSFC",
>>     "ADPSFC" ;
>>
>>    hdr_sid =
>>     "MSSS",
>>     "MSSS",
>>     "MSSS",
>>     "MSSS" ;
>>
>>    hdr_vld =
>>     "20111001_115000",
>>     "20111001_115000",
>>     "20111001_115501",
>>     "20111001_115501" ;
>>
>>    hdr_arr =
>>     13.7, -89.12, 621,
>>     13.7, -89.12, 621,
>>     13.7, -89.12, 621,
>>     13.7, -89.12, 621 ;
>> }
>>
>> So, from what I can tell, the station is reporting the same obs at
2
>> different times,
>> 1150(00) UTC and 1155(01) UTC.  Do you have any recommendation on
how I can retain only one of these obs, preferably the one closest to
the top of the hour?  I know I could dramatically narrow down the time
window (e.g. +/- 5 min), but I suspect this would likely miss out on
most observations that report about 10 minutes before the hour.
>>
>> I value your feedback on this matter.
>> Sincerely,
>> Jonathan
>>
>> -----Original Message-----
>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>> Sent: Thursday, January 19, 2012 1:40 PM
>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>> Cc: tcram at ucar.edu
>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>
>> Jonathan,
>>
>> OK, I reran my analysis using the setting you suggested:
>>       in_report_type[] = [ 512, 522, 531, 540, 562 ];
>>
>> Here's what I see:
>>
>>      - For qm=2, there are 29 locations, all with unique header
information, and all with unique lat/lons.
>>        It looks like the station id's are all alphabetical.  So the
"in_report_type" setting has filtered out the numeric station id's.
>>
>>      - For qm=9, there are 57 locations - but only 29 of them have
unique header information!
>>
>> So I'll need to look more closely at what PB2NC is doing here.  It
looks like setting qm=9 really is causing duplicate observations to be
retained.
>>
>> When I get a chance, I run it through the debugger to investigate.
>>
>> Thanks,
>> John
>>
>>
>> On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>
>>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>
>>> John,
>>>
>>> I noticed that even with specifying the input report types, there
are still a few duplicate observations in the final netcdf dataset.
>>> So, I'm seeing the same thing as in your analysis.
>>>
>>> -Jonathan
>>>
>>> -----Original Message-----
>>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>>> Sent: Thursday, January 19, 2012 1:01 PM
>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>> Cc: tcram at ucar.edu
>>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>>
>>> Jonathan,
>>>
>>> I apologize for the long delay in getting back to you on this.
We've been scrambling over the last couple of weeks to finish up
development on a new release.  Here's my recollection of what's going
on with this issue:
>>>
>>>       - You're using the GDAS PrepBUFR observation dataset, but
you're finding that PB2NC retains very few ADPSFC observations when
you a quality marker of 2.
>>>       - We advised via MET-Help that the algorithm employed by
NCEP in the GDAS processing sets most ADPSFC observations' quality
marker to a value of 9.  NCEP does that to prevent those observations
from being used in the data assimilation.  So the use of quality
marker = 9 is more an artifact of the data assimilation process and
not really saying anything about the quality of those observations.
>>>       - When you switch to using a quality marker = 9 in PB2NC,
you got many matches, but ended up with more "duplicate" observations.
>>>
>>> So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>>>
>>> I did some investigation on this issue this morning.  Here's what
I did:
>>>
>>> - Retrieved this file:
>>>
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/
>>> gdas1.t12z.prepbufr.nr
>>> - Ran it through PB2NC from message type = ADPSFC, time window =
+/- 0 seconds, and quality markers of 2 and 9.
>>> - For both, I used the updated version of the plot_point_obs tool
to create a plot of the data and dump header information about the
points being plotted.
>>> - I also used the -dump option for PB2NC to dump all of the ADPSFC
observations to ASCII format.
>>>
>>> I've attached several things to this message:
>>> - The postscript output of plot_point_obs for qm = 2 and qm = 9m,
after first converting to png format.
>>> - The output from the plot_point_obs tool for both runs.
>>>
>>> For qm=2, there were 51 locations plotted in your domain.
>>>       - Of those 51...
>>>          - All 51 header entries are unique.
>>>          - There are only 36 unique combinations of lat/lon.
>>> For qm=9, there were 101 locations plotted in your domain.
>>>       - Of those 101...
>>>          - There are only 52 unique header entries.
>>>          - There are only 37 unique combinations of lat/lon.
>>>
>>> I think there are two issues occurring here:
>>>
>>> (1) When using qm=2, you'll often see two observing locations that
look the same except for the station ID.  For example:
>>>      [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>      [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>>>
>>> I looked at the observations that correspond to these and found
that they do actually differ slightly.
>>>
>>> (2) The second, larger issue here is when using qm=9.  It does
appear that we're really getting duplicate observations.  Foe example:
>>>      [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>      [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>
>>> This will likely require further debugging of the PB2NC tool to
figure out what's going on.
>>>
>>> I just wanted to let you know what I've found so far.
>>>
>>> Thanks,
>>> John Halley Gotway
>>>
>>>
>>>
>>> On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>>
>>>> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
>>>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>>>            Queue: met_help
>>>>          Subject: RE: Help with ds337.0
>>>>            Owner: Nobody
>>>>       Requestors: jonathan.case-1 at nasa.gov
>>>>           Status: new
>>>>      Ticket<URL:
>>>> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>
>>>>
>>>> Hi Tom/MET help,
>>>>
>>>> Thanks for the fantastically quick reply, Tom!
>>>>
>>>> It turns out that I'm specifically referring to the netcdf output
from the pb2nc program.
>>>>
>>>> I already sent a help ticket to the MET team, asking if they have
a means for removing the duplicate obs from their PB2NC process.  At
the time, they didn't refer to the history of data as it undergoes QC,
so this might help me track down the reason for the duplicate obs.
So, I have CC'd the met_help to this email.
>>>>
>>>> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>>>>
>>>>>      From what it sounds like, I need to better understand what
the "happy medium" should be in setting the quality_mark_thresh flag
in pb2nc.  2 is too restrictive, while 9 appears to be allowing
duplicate observations into the mix as a result of the QC process.
>>>>
>>>> Any recommendations are greatly welcome!
>>>>
>>>> Thanks much,
>>>> Jonathan
>>>>
>>>>
>>>> From: Thomas Cram [mailto:tcram at ucar.edu]
>>>> Sent: Friday, January 13, 2012 4:39 PM
>>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>>> Subject: Re: Help with ds337.0
>>>>
>>>> Hi Jonathan,
>>>>
>>>> the only experience I have working with the MET software is using
the pb2nc utility to convert PREPBUFR observations into a NetCDF
dataset, so my knowledge of MET is limited.  However, the one reason I
can think of for the duplicate observations is that you're seeing the
same observation after several stages of quality-control pre-
processing.  The PREPBUFR files contain a complete history of the data
as it's modified during QC, so each station will have multiple reports
at a single time.  There's a quality control flag appended to each
PREPBUFR message; you want to keep the observation with the lowest QC
number.
>>>>
>>>> Can you send me the date and time for the examples you list
below?  I'll take a look at the PREPBUFR messages and see if this is
the case.
>>>>
>>>> If this doesn't explain it, then I'll forward your question on to
MET support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>>>>
>>>> - Tom
>>>>
>>>> On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO
INC] wrote:
>>>>
>>>>
>>>> Dear Thomas,
>>>>
>>>> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>>>> I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>>>
>>>> Now here is the interesting part:  When I examined the textual
difference files generated by the MET software, I noticed that there
were several stations with "duplicate" observations that led to
duplicate forecast-observation difference pairs.  I put duplicate in
quotes because the observed values were not necessarily the same but
usually very close to one another.
>>>> The duplicate observations arose from the fact that at the same
observation location, there would be a 5-digit WMO identifier as well
as a 4-digit text station ID at a given hour.
>>>> I stumbled on these duplicate station data when I made a table of
stations and mapped them, revealing the duplicates.
>>>>
>>>> Some examples I stumbled on include:
>>>> *         78720/MHTG (both at 14.05N, -87.22E)
>>>> *         78641/MGGT (both at 14.58N, -90.52E)
>>>> *         78711/MHPL (both at 15.22N, -83.80E)
>>>> *         78708/MHLM (both at 15.45N, -87.93)
>>>>
>>>> There are others, but I thought I'd provide a few examples to
start.
>>>>
>>>> If the source of the duplicates is NCEP/EMC, I wonder if it would
be helpful to send them a note as well?
>>>>
>>>> Let me know how you would like to proceed.
>>>>
>>>> Most sincerely,
>>>> Jonathan
>>>>
>>>>
--------------------------------------------------------------------
>>>> --
>>>> ----------------------------
>>>> Jonathan Case, ENSCO Inc.
>>>> NASA Short-term Prediction Research and Transition Center (aka
SPoRT
>>>> Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
>>>> Voice: 256.961.7504
>>>> Fax: 256.961.7788
>>>> Emails: Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>
/
>>>> case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>>>
--------------------------------------------------------------------
>>>> --
>>>> ----------------------------
>>>>
>>>> "Whether the weather is cold, or whether the weather is hot,
we'll weather
>>>>       the weather whether we like it or not!"
>>>>
>>>>
>>>> Thomas Cram
>>>> NCAR / CISL / DSS
>>>> 303-497-1217
>>>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>


------------------------------------------------
Subject: RE: Help with ds337.0
From: Paul Oldenburg
Time: Thu Mar 01 14:03:23 2012

VERSION MODEL FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD
OBS_VALID_BEG   OBS_VALID_END   FCST_VAR FCST_LEV OBS_VAR OBS_LEV
OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH
COV_THRESH ALPHA   LINE_TYPE
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_113000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 1       PACF1   30.15000 -85.67000 1015.90002 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113600 20120214_113600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 2       PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_114200 20120214_114200 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 3       PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_114800 20120214_114800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 4       PACF1   30.15000 -85.67000 1016.20001 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_115400 20120214_115400 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 5       PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 5.68858 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120000 20120214_120000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 6       PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 5.70000 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120600 20120214_120600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 7       PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_121200 20120214_121200 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 8       PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_121800 20120214_121800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 9       PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_122400 20120214_122400 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 10      PACF1   30.15000 -85.67000 1016.20001 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_123000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       11 11      PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           >=5.000     >=5.000    NA
NA      FHO       11 1.00000 1.00000 1.00000
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           >=5.000     >=5.000    NA
NA      CTC       11 11      0       0        0
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           >=5.000     >=5.000    NA
0.05000 CTS       11 1.00000 0.74117 1.00000  NA        NA
1.00000 0.74117 1.00000 NA NA 1.00000 0.74117 1.00000 NA NA 1.00000 NA
NA      1.00000 0.74117 1.00000 NA NA NA NA NA NA NA NA NA NA      NA
NA      0.00000 0.00000 0.25883 NA      NA      1.00000 0.74117
1.00000 NA NA NA      NA NA NA      NA NA NA      NA NA NA      NA NA
NA       NA NA NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
0.05000 CNT       11 6.51961 6.51961 6.51961  NA        NA
0.00000 0.00000 0.00000 NA NA 6.24577 6.00699 6.48456 NA NA 0.35543
0.24835 0.62376 NA      NA      NA      NA NA NA NA NA NA 11 10 7
0.27384 0.03505 0.51262 NA      NA      0.35543 0.24835 0.62376 NA
NA      1.04384 NA NA 0.37671 NA NA 0.18983 NA NA 0.11485 NA NA
0.43570 NA NA -0.18859 NA NA 0.06632 NA NA 0.32122 NA NA 0.32122 NA NA
0.81961 NA NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      SL1L2     11 6.51961 6.24577 40.72001 42.50532  39.12454

------------------------------------------------
Subject: RE: Help with ds337.0
From: Paul Oldenburg
Time: Thu Mar 01 14:03:23 2012

VERSION MODEL FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD
OBS_VALID_BEG   OBS_VALID_END   FCST_VAR FCST_LEV OBS_VAR OBS_LEV
OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH
COV_THRESH ALPHA   LINE_TYPE
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_113000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       7 1       PACF1   30.15000 -85.67000 1015.90002 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113600 20120214_113600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       7 2       PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_114800 20120214_114800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       7 3       PACF1   30.15000 -85.67000 1016.20001 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_115400 20120214_115400 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       7 4       PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 5.68858 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120000 20120214_120000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       7 5       PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 5.70000 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120600 20120214_120600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       7 6       PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_121800 20120214_121800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      MPR       7 7       PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           >=5.000     >=5.000    NA
NA      FHO       7 1.00000 1.00000 1.00000
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           >=5.000     >=5.000    NA
NA      CTC       7 7       0       0        0
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           >=5.000     >=5.000    NA
0.05000 CTS       7 1.00000 0.64567 1.00000  NA        NA
1.00000 0.64567 1.00000 NA NA 1.00000 0.64567 1.00000 NA NA 1.00000 NA
NA      1.00000 0.64567 1.00000 NA NA NA NA NA NA NA NA NA NA      NA
NA      0.00000 0.00000 0.35433 NA      NA      1.00000 0.64567
1.00000 NA NA NA      NA NA NA      NA NA NA      NA NA NA      NA NA
NA       NA NA NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
0.05000 CNT       7 6.51961 6.51961 6.51961  NA        NA
0.00000 0.00000 0.00000 NA NA 6.20002 5.81719 6.58286 NA NA 0.41394
0.26674 0.91153 NA      NA      NA      NA NA NA NA NA NA 7  6  3
0.31959 -0.06325 0.70242 NA      NA      0.41394 0.26674 0.91153 NA
NA      1.05155 NA NA 0.42736 NA NA 0.24901 NA NA 0.14687 NA NA
0.49901 NA NA -0.18859 NA NA 0.06632 NA NA 0.32122 NA NA 0.57042 NA NA
0.82418 NA NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA      SL1L2     7 6.51961 6.20002 40.42173 42.50532  38.58714

------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: Case, Jonathan[ENSCO INC]
Time: Thu Mar 01 14:05:26 2012

Paul,

Can you send me examples of _mpr.txt files?  They are easier
to read than the .stat files.

Thanks again!
Jon

-----Original
Message-----
From: Paul Oldenburg via RT [mailto:met_help at ucar.edu]
Sent: Thursday, March 01, 2012 3:03 PM
To: Case, Jonathan (MSFC-
ZP11)[ENSCO INC]
Cc: tcram at ucar.edu
Subject: Re: [rt.rap.ucar.edu
#52626] RE: Help with ds337.0

Jonathan,

Sorry, I should have
noticed you were using METv3.0.  The patch will only work for METv3.1.
You should strongly consider upgrading to METv3.1 for reasons other
than this patch, including a major change in how gridded data is
handled internally by MET.  If you have more questions about this
change, I'll refer you to John.

Regarding the handling of duplicate
observations, the patched code will discard all observations that
qualify as duplicates, keeping only a single one.  I attached two
point_stat output files that include matched pairs showing the effect
of the unique flag in the patched code.  I hope this answers your
question.

Paul


On 03/01/2012 01:55 PM, Case, Jonathan[ENSCO
INC] via RT wrote:
>
> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
> Paul,
>
> Thanks for the patch.  I believe that I'm running MET v3.0, not
3.1.  Hopefully the source files in your patch might work with our
older version.  If not, then I'll need to fully upgrade my config
files and such before I can test out the patch, so you may not hear
from me right away.
>
> One follow-on question I have is whether the
patch will reject ALL duplicate obs since I've seen 3 duplicate obs in
some instances within the 20-minute time window used in point_stat?
>
> Thanks again,
> Jonathan
>
> -----Original Message-----
> From:
Paul Oldenburg via RT [mailto:met_help at ucar.edu]
> Sent: Thursday,
March 01, 2012 2:50 PM
> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
>
Cc: tcram at ucar.edu
> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help
with ds337.0
>
> Jonathan,
>
> I developed a patch for handling
point observations in MET that will
> allow the user to optionally
throw out duplicate observations.  The
> process that I implemented
does not use the observation with the
> timestamp closest to the
forecast valid time as you suggested, because
> of complications in
how the code handles obs.  Then, when I thought
> about this, it
occurred to me that it doesn't matter because the
> observation is a
duplicate anyway.  (right?)
>
> A duplicate observation is defined
as an observation with identical message type, station id, grib code
(forecast field), latitude, longitude, level, height and observed
value to an observation that has already been processed.
>
> Please
deploy the latest MET patches and then the attached patch:
>
> 1.
Deploy latest METv3.1 patches from
>
http://www.dtcenter.org/met/users/support/known_issues/METv3.1/index.p
> hp 2. Save attached tarball to base MET directory 3. Untar it, which
> should overwrite four source files 4. Run 'make clean' and then
'make'
>
> When this is complete, you should notice a new command-
line option for point_stat: -unique.  When you use this, point_stat
should detect and throw out duplicate observations.  If your verbosity
level is set to 3 or higher, it will report which observations are
being thrown out.  Please test this and let me know if you have any
trouble or if it does not work in the way that you expected.
>
>
Thanks,
>
> Paul
>
>
> On 02/13/2012 03:10 PM, Case,
Jonathan[ENSCO INC] via RT wrote:
>>
>> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>> Hello
John/Tim/Methelp,
>>
>> I finally got back into looking at this
issue with duplicate obs showing up in the PB2NC output, resulting in
duplicate fcst-obs pairs being processed by point_stat.
>>
>> I
found a single obs site in Central America that is generating a
problem on 1 Oct 2011 at 12z (stid "MSSS").
>> I processed ONLY this
obs through pb2nc to see what the result is in the netcdf file.
>>
>> Here is what I see: (from an ncdump of the netcdf file) .
>> .
>>
.
>> .
>> data:
>>
>>    obs_arr =
>>     0, 33, 942.5, -9999, 0,
>>     0, 34, 942.5, -9999, -1,
>>     0, 32, 942.5, -9999, 1,
>>
1, 51, 942.5, 619.9501, 0.016576,
>>     1, 11, 942.5, 619.9501,
297.15,
>>     1, 17, 942.5, 619.9501, 293.9545,
>>     1, 2, 942.5,
619.9501, 101349.7,
>>     2, 33, 942.5, -9999, 0,
>>     2, 34,
942.5, -9999, -1,
>>     2, 32, 942.5, -9999, 1,
>>     3, 51,
942.5, 619.9501, 0.016576,
>>     3, 11, 942.5, 619.9501, 297.15,
>>
3, 17, 942.5, 619.9501, 293.9545,
>>     3, 2, 942.5, 619.9501,
101349.7 ;
>>
>>    hdr_typ =
>>     "ADPSFC",
>>     "ADPSFC",
>>     "ADPSFC",
>>     "ADPSFC" ;
>>
>>    hdr_sid =
>>
"MSSS",
>>     "MSSS",
>>     "MSSS",
>>     "MSSS" ;
>>
>>
hdr_vld =
>>     "20111001_115000",
>>     "20111001_115000",
>>
"20111001_115501",
>>     "20111001_115501" ;
>>
>>    hdr_arr =
>>     13.7, -89.12, 621,
>>     13.7, -89.12, 621,
>>     13.7,
-89.12, 621,
>>     13.7, -89.12, 621 ;
>> }
>>
>> So, from what I
can tell, the station is reporting the same obs at 2
>> different
times,
>> 1150(00) UTC and 1155(01) UTC.  Do you have any
recommendation on how I can retain only one of these obs, preferably
the one closest to the top of the hour?  I know I could dramatically
narrow down the time window (e.g. +/- 5 min), but I suspect this would
likely miss out on most observations that report about 10 minutes
before the hour.
>>
>> I value your feedback on this matter.
>>
Sincerely,
>> Jonathan
>>
>> -----Original Message-----
>> From:
John Halley Gotway via RT [mailto:met_help at ucar.edu]
>> Sent:
Thursday, January 19, 2012 1:40 PM
>> To: Case, Jonathan (MSFC-
VP61)[ENSCO INC]
>> Cc: tcram at ucar.edu
>> Subject: Re:
[rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>
>> Jonathan,
>>
>> OK, I reran my analysis using the setting you suggested:
>>
in_report_type[] = [ 512, 522, 531, 540, 562 ];
>>
>> Here's what I
see:
>>
>>      - For qm=2, there are 29 locations, all with unique
header information, and all with unique lat/lons.
>>        It looks
like the station id's are all alphabetical.  So the "in_report_type"
setting has filtered out the numeric station id's.
>>
>>      - For
qm=9, there are 57 locations - but only 29 of them have unique header
information!
>>
>> So I'll need to look more closely at what PB2NC
is doing here.  It looks like setting qm=9 really is causing duplicate
observations to be retained.
>>
>> When I get a chance, I run it
through the debugger to investigate.
>>
>> Thanks,
>> John
>>
>>
>> On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>
>>> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>
>>>
John,
>>>
>>> I noticed that even with specifying the input report
types, there are still a few duplicate observations in the final
netcdf dataset.
>>> So, I'm seeing the same thing as in your
analysis.
>>>
>>> -Jonathan
>>>
>>> -----Original Message-----
>>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>>>
Sent: Thursday, January 19, 2012 1:01 PM
>>> To: Case, Jonathan
(MSFC-VP61)[ENSCO INC]
>>> Cc: tcram at ucar.edu
>>> Subject: Re:
[rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>>
>>> Jonathan,
>>>
>>> I apologize for the long delay in getting back to you on
this.  We've been scrambling over the last couple of weeks to finish
up development on a new release.  Here's my recollection of what's
going on with this issue:
>>>
>>>       - You're using the GDAS
PrepBUFR observation dataset, but you're finding that PB2NC retains
very few ADPSFC observations when you a quality marker of 2.
>>>
- We advised via MET-Help that the algorithm employed by NCEP in the
GDAS processing sets most ADPSFC observations' quality marker to a
value of 9.  NCEP does that to prevent those observations from being
used in the data assimilation.  So the use of quality marker = 9 is
more an artifact of the data assimilation process and not really
saying anything about the quality of those observations.
>>>       -
When you switch to using a quality marker = 9 in PB2NC, you got many
matches, but ended up with more "duplicate" observations.
>>>
>>> So
is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>>>
>>> I did some investigation on
this issue this morning.  Here's what I did:
>>>
>>> - Retrieved
this file:
>>>
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112
>>> /
>>> gdas1.t12z.prepbufr.nr
>>> - Ran it through PB2NC from
message type = ADPSFC, time window = +/- 0 seconds, and quality
markers of 2 and 9.
>>> - For both, I used the updated version of the
plot_point_obs tool to create a plot of the data and dump header
information about the points being plotted.
>>> - I also used the
-dump option for PB2NC to dump all of the ADPSFC observations to ASCII
format.
>>>
>>> I've attached several things to this message:
>>> -
The postscript output of plot_point_obs for qm = 2 and qm = 9m, after
first converting to png format.
>>> - The output from the
plot_point_obs tool for both runs.
>>>
>>> For qm=2, there were 51
locations plotted in your domain.
>>>       - Of those 51...
>>>
- All 51 header entries are unique.
>>>          - There are only 36
unique combinations of lat/lon.
>>> For qm=9, there were 101
locations plotted in your domain.
>>>       - Of those 101...
>>>
- There are only 52 unique header entries.
>>>          - There are
only 37 unique combinations of lat/lon.
>>>
>>> I think there are
two issues occurring here:
>>>
>>> (1) When using qm=2, you'll often
see two observing locations that look the same except for the station
ID.  For example:
>>>      [ ADPSFC, 78792, 20120112_120000, 9.05,
-79.37, 11 ]
>>>      [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37,
11 ]
>>>
>>> I looked at the observations that correspond to these
and found that they do actually differ slightly.
>>>
>>> (2) The
second, larger issue here is when using qm=9.  It does appear that
we're really getting duplicate observations.  Foe example:
>>>      [
ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>      [ ADPSFC,
78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>
>>> This will likely
require further debugging of the PB2NC tool to figure out what's going
on.
>>>
>>> I just wanted to let you know what I've found so far.
>>>
>>> Thanks,
>>> John Halley Gotway
>>>
>>>
>>>
>>> On
01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>>
>>>> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
>>>>
Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>>>
Queue: met_help
>>>>          Subject: RE: Help with ds337.0
>>>>
Owner: Nobody
>>>>       Requestors: jonathan.case-1 at nasa.gov
>>>>
Status: new
>>>>      Ticket<URL:
>>>>
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>
>>>>
>>>> Hi Tom/MET help,
>>>>
>>>> Thanks for the fantastically quick
reply, Tom!
>>>>
>>>> It turns out that I'm specifically referring
to the netcdf output from the pb2nc program.
>>>>
>>>> I already
sent a help ticket to the MET team, asking if they have a means for
removing the duplicate obs from their PB2NC process.  At the time,
they didn't refer to the history of data as it undergoes QC, so this
might help me track down the reason for the duplicate obs.  So, I have
CC'd the met_help to this email.
>>>>
>>>> It turns out that when I
initially ran pb2nc with the default quality control flag set to "2"
(i.e. quality_mark_thresh in the PB2NCConfig_default file), I did not
get ANY surface observations in my final netcdf file over Central
America.  Upon email exchanges with the MET team, it was recommended
that I set the quality control flag to "9" to be able to accept more
observations into the netcdf outfile.
>>>>
>>>>>      From what it
sounds like, I need to better understand what the "happy medium"
should be in setting the quality_mark_thresh flag in pb2nc.  2 is too
restrictive, while 9 appears to be allowing duplicate observations
into the mix as a result of the QC process.
>>>>
>>>> Any
recommendations are greatly welcome!
>>>>
>>>> Thanks much,
>>>>
Jonathan
>>>>
>>>>
>>>> From: Thomas Cram [mailto:tcram at ucar.edu]
>>>> Sent: Friday, January 13, 2012 4:39 PM
>>>> To: Case, Jonathan
(MSFC-VP61)[ENSCO INC]
>>>> Subject: Re: Help with ds337.0
>>>>
>>>> Hi Jonathan,
>>>>
>>>> the only experience I have working with
the MET software is using the pb2nc utility to convert PREPBUFR
observations into a NetCDF dataset, so my knowledge of MET is limited.
However, the one reason I can think of for the duplicate observations
is that you're seeing the same observation after several stages of
quality-control pre-processing.  The PREPBUFR files contain a complete
history of the data as it's modified during QC, so each station will
have multiple reports at a single time.  There's a quality control
flag appended to each PREPBUFR message; you want to keep the
observation with the lowest QC number.
>>>>
>>>> Can you send me the
date and time for the examples you list below?  I'll take a look at
the PREPBUFR messages and see if this is the case.
>>>>
>>>> If this
doesn't explain it, then I'll forward your question on to MET support
desk and see if they know the reason for duplicate observations.  They
are intimately familiar with the PREPBUFR obs, so I'm sure they can
help you out.
>>>>
>>>> - Tom
>>>>
>>>> On Jan 13, 2012, at 3:16
PM, Case, Jonathan (MSFC-VP61)[ENSCO INC] wrote:
>>>>
>>>>
>>>>
Dear Thomas,
>>>>
>>>> This is Jonathan Case of the NASA SPoRT
Center (http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>>>>
I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>>>
>>>> Now here is the
interesting part:  When I examined the textual difference files
generated by the MET software, I noticed that there were several
stations with "duplicate" observations that led to duplicate forecast-
observation difference pairs.  I put duplicate in quotes because the
observed values were not necessarily the same but usually very close
to one another.
>>>> The duplicate observations arose from the fact
that at the same observation location, there would be a 5-digit WMO
identifier as well as a 4-digit text station ID at a given hour.
>>>>
I stumbled on these duplicate station data when I made a table of
stations and mapped them, revealing the duplicates.
>>>>
>>>> Some
examples I stumbled on include:
>>>> *         78720/MHTG (both at
14.05N, -87.22E)
>>>> *         78641/MGGT (both at 14.58N, -90.52E)
>>>> *         78711/MHPL (both at 15.22N, -83.80E)
>>>> *
78708/MHLM (both at 15.45N, -87.93)
>>>>
>>>> There are others, but
I thought I'd provide a few examples to start.
>>>>
>>>> If the
source of the duplicates is NCEP/EMC, I wonder if it would be helpful
to send them a note as well?
>>>>
>>>> Let me know how you would
like to proceed.
>>>>
>>>> Most sincerely,
>>>> Jonathan
>>>>
>>>>
-------------------------------------------------------------------
>>>> -
>>>> --
>>>> ----------------------------
>>>> Jonathan
Case, ENSCO Inc.
>>>> NASA Short-term Prediction Research and
Transition Center (aka
>>>> SPoRT
>>>> Center) 320 Sparkman Drive,
Room 3062 Huntsville, AL 35805
>>>> Voice: 256.961.7504
>>>> Fax:
256.961.7788
>>>> Emails: Jonathan.Case-
1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>     /
>>>>
case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>>>
-------------------------------------------------------------------
>>>> -
>>>> --
>>>> ----------------------------
>>>>
>>>>
"Whether the weather is cold, or whether the weather is hot, we'll
weather
>>>>       the weather whether we like it or not!"
>>>>
>>>>
>>>> Thomas Cram
>>>> NCAR / CISL / DSS
>>>> 303-497-1217
>>>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

------------------------------------------------
Subject: RE: Help with ds337.0
From: Paul Oldenburg
Time: Thu Mar 01 14:15:05 2012

Jonathan,

See attached.  By the way, I included the relevant output from
point_stat using the -unique flag below.

Paul


DEBUG 2: Searching 164 observations from 82 messages.
DEBUG 3: VxPairDataPoint::add_obs() -> found duplicate observation for
key
'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.10_-9999.00' with value
6.7082.
DEBUG 3: VxPairDataPoint::add_obs() -> found duplicate observation for
key
'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.10_-9999.00' with value
6.19839.
DEBUG 3: VxPairDataPoint::add_obs() -> found duplicate observation for
key
'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.20_-9999.00' with value
6.19839.
DEBUG 3: VxPairDataPoint::add_obs() -> found duplicate observation for
key
'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.10_-9999.00' with value
6.19839.


On 03/01/2012 02:05 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
> Paul,
>
> Can you send me examples of _mpr.txt files?  They are easier to read
than the .stat files.
>
> Thanks again!
> Jon
>
> -----Original Message-----
> From: Paul Oldenburg via RT [mailto:met_help at ucar.edu]
> Sent: Thursday, March 01, 2012 3:03 PM
> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
> Cc: tcram at ucar.edu
> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>
> Jonathan,
>
> Sorry, I should have noticed you were using METv3.0.  The patch will
only work for METv3.1.  You should strongly consider upgrading to
METv3.1 for reasons other than this patch, including a major change in
how gridded data is handled internally by MET.  If you have more
questions about this change, I'll refer you to John.
>
> Regarding the handling of duplicate observations, the patched code
will discard all observations that qualify as duplicates, keeping only
a single one.  I attached two point_stat output files that include
matched pairs showing the effect of the unique flag in the patched
code.  I hope this answers your question.
>
> Paul
>
>
> On 03/01/2012 01:55 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>> Paul,
>>
>> Thanks for the patch.  I believe that I'm running MET v3.0, not
3.1.  Hopefully the source files in your patch might work with our
older version.  If not, then I'll need to fully upgrade my config
files and such before I can test out the patch, so you may not hear
from me right away.
>>
>> One follow-on question I have is whether the patch will reject ALL
duplicate obs since I've seen 3 duplicate obs in some instances within
the 20-minute time window used in point_stat?
>>
>> Thanks again,
>> Jonathan
>>
>> -----Original Message-----
>> From: Paul Oldenburg via RT [mailto:met_help at ucar.edu]
>> Sent: Thursday, March 01, 2012 2:50 PM
>> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
>> Cc: tcram at ucar.edu
>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>
>> Jonathan,
>>
>> I developed a patch for handling point observations in MET that
will
>> allow the user to optionally throw out duplicate observations.  The
>> process that I implemented does not use the observation with the
>> timestamp closest to the forecast valid time as you suggested,
because
>> of complications in how the code handles obs.  Then, when I thought
>> about this, it occurred to me that it doesn't matter because the
>> observation is a duplicate anyway.  (right?)
>>
>> A duplicate observation is defined as an observation with identical
message type, station id, grib code (forecast field), latitude,
longitude, level, height and observed value to an observation that has
already been processed.
>>
>> Please deploy the latest MET patches and then the attached patch:
>>
>> 1. Deploy latest METv3.1 patches from
>>
http://www.dtcenter.org/met/users/support/known_issues/METv3.1/index.p
>> hp 2. Save attached tarball to base MET directory 3. Untar it,
which
>> should overwrite four source files 4. Run 'make clean' and then
'make'
>>
>> When this is complete, you should notice a new command-line option
for point_stat: -unique.  When you use this, point_stat should detect
and throw out duplicate observations.  If your verbosity level is set
to 3 or higher, it will report which observations are being thrown
out.  Please test this and let me know if you have any trouble or if
it does not work in the way that you expected.
>>
>> Thanks,
>>
>> Paul
>>
>>
>> On 02/13/2012 03:10 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>
>>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>
>>> Hello John/Tim/Methelp,
>>>
>>> I finally got back into looking at this issue with duplicate obs
showing up in the PB2NC output, resulting in duplicate fcst-obs pairs
being processed by point_stat.
>>>
>>> I found a single obs site in Central America that is generating a
problem on 1 Oct 2011 at 12z (stid "MSSS").
>>> I processed ONLY this obs through pb2nc to see what the result is
in the netcdf file.
>>>
>>> Here is what I see: (from an ncdump of the netcdf file) .
>>> .
>>> .
>>> .
>>> data:
>>>
>>>     obs_arr =
>>>      0, 33, 942.5, -9999, 0,
>>>      0, 34, 942.5, -9999, -1,
>>>      0, 32, 942.5, -9999, 1,
>>>      1, 51, 942.5, 619.9501, 0.016576,
>>>      1, 11, 942.5, 619.9501, 297.15,
>>>      1, 17, 942.5, 619.9501, 293.9545,
>>>      1, 2, 942.5, 619.9501, 101349.7,
>>>      2, 33, 942.5, -9999, 0,
>>>      2, 34, 942.5, -9999, -1,
>>>      2, 32, 942.5, -9999, 1,
>>>      3, 51, 942.5, 619.9501, 0.016576,
>>>      3, 11, 942.5, 619.9501, 297.15,
>>>      3, 17, 942.5, 619.9501, 293.9545,
>>>      3, 2, 942.5, 619.9501, 101349.7 ;
>>>
>>>     hdr_typ =
>>>      "ADPSFC",
>>>      "ADPSFC",
>>>      "ADPSFC",
>>>      "ADPSFC" ;
>>>
>>>     hdr_sid =
>>>      "MSSS",
>>>      "MSSS",
>>>      "MSSS",
>>>      "MSSS" ;
>>>
>>>     hdr_vld =
>>>      "20111001_115000",
>>>      "20111001_115000",
>>>      "20111001_115501",
>>>      "20111001_115501" ;
>>>
>>>     hdr_arr =
>>>      13.7, -89.12, 621,
>>>      13.7, -89.12, 621,
>>>      13.7, -89.12, 621,
>>>      13.7, -89.12, 621 ;
>>> }
>>>
>>> So, from what I can tell, the station is reporting the same obs at
2
>>> different times,
>>> 1150(00) UTC and 1155(01) UTC.  Do you have any recommendation on
how I can retain only one of these obs, preferably the one closest to
the top of the hour?  I know I could dramatically narrow down the time
window (e.g. +/- 5 min), but I suspect this would likely miss out on
most observations that report about 10 minutes before the hour.
>>>
>>> I value your feedback on this matter.
>>> Sincerely,
>>> Jonathan
>>>
>>> -----Original Message-----
>>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>>> Sent: Thursday, January 19, 2012 1:40 PM
>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>> Cc: tcram at ucar.edu
>>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>>
>>> Jonathan,
>>>
>>> OK, I reran my analysis using the setting you suggested:
>>>        in_report_type[] = [ 512, 522, 531, 540, 562 ];
>>>
>>> Here's what I see:
>>>
>>>       - For qm=2, there are 29 locations, all with unique header
information, and all with unique lat/lons.
>>>         It looks like the station id's are all alphabetical.  So
the "in_report_type" setting has filtered out the numeric station
id's.
>>>
>>>       - For qm=9, there are 57 locations - but only 29 of them
have unique header information!
>>>
>>> So I'll need to look more closely at what PB2NC is doing here.  It
looks like setting qm=9 really is causing duplicate observations to be
retained.
>>>
>>> When I get a chance, I run it through the debugger to investigate.
>>>
>>> Thanks,
>>> John
>>>
>>>
>>> On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>>
>>>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>
>>>> John,
>>>>
>>>> I noticed that even with specifying the input report types, there
are still a few duplicate observations in the final netcdf dataset.
>>>> So, I'm seeing the same thing as in your analysis.
>>>>
>>>> -Jonathan
>>>>
>>>> -----Original Message-----
>>>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>>>> Sent: Thursday, January 19, 2012 1:01 PM
>>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>>> Cc: tcram at ucar.edu
>>>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>>>
>>>> Jonathan,
>>>>
>>>> I apologize for the long delay in getting back to you on this.
We've been scrambling over the last couple of weeks to finish up
development on a new release.  Here's my recollection of what's going
on with this issue:
>>>>
>>>>        - You're using the GDAS PrepBUFR observation dataset, but
you're finding that PB2NC retains very few ADPSFC observations when
you a quality marker of 2.
>>>>        - We advised via MET-Help that the algorithm employed by
NCEP in the GDAS processing sets most ADPSFC observations' quality
marker to a value of 9.  NCEP does that to prevent those observations
from being used in the data assimilation.  So the use of quality
marker = 9 is more an artifact of the data assimilation process and
not really saying anything about the quality of those observations.
>>>>        - When you switch to using a quality marker = 9 in PB2NC,
you got many matches, but ended up with more "duplicate" observations.
>>>>
>>>> So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>>>>
>>>> I did some investigation on this issue this morning.  Here's what
I did:
>>>>
>>>> - Retrieved this file:
>>>>
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112
>>>> /
>>>> gdas1.t12z.prepbufr.nr
>>>> - Ran it through PB2NC from message type = ADPSFC, time window =
+/- 0 seconds, and quality markers of 2 and 9.
>>>> - For both, I used the updated version of the plot_point_obs tool
to create a plot of the data and dump header information about the
points being plotted.
>>>> - I also used the -dump option for PB2NC to dump all of the
ADPSFC observations to ASCII format.
>>>>
>>>> I've attached several things to this message:
>>>> - The postscript output of plot_point_obs for qm = 2 and qm = 9m,
after first converting to png format.
>>>> - The output from the plot_point_obs tool for both runs.
>>>>
>>>> For qm=2, there were 51 locations plotted in your domain.
>>>>        - Of those 51...
>>>>           - All 51 header entries are unique.
>>>>           - There are only 36 unique combinations of lat/lon.
>>>> For qm=9, there were 101 locations plotted in your domain.
>>>>        - Of those 101...
>>>>           - There are only 52 unique header entries.
>>>>           - There are only 37 unique combinations of lat/lon.
>>>>
>>>> I think there are two issues occurring here:
>>>>
>>>> (1) When using qm=2, you'll often see two observing locations
that look the same except for the station ID.  For example:
>>>>       [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>>       [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>>>>
>>>> I looked at the observations that correspond to these and found
that they do actually differ slightly.
>>>>
>>>> (2) The second, larger issue here is when using qm=9.  It does
appear that we're really getting duplicate observations.  Foe example:
>>>>       [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>>       [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>>
>>>> This will likely require further debugging of the PB2NC tool to
figure out what's going on.
>>>>
>>>> I just wanted to let you know what I've found so far.
>>>>
>>>> Thanks,
>>>> John Halley Gotway
>>>>
>>>>
>>>>
>>>> On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>>>
>>>>> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
>>>>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>>>>             Queue: met_help
>>>>>           Subject: RE: Help with ds337.0
>>>>>             Owner: Nobody
>>>>>        Requestors: jonathan.case-1 at nasa.gov
>>>>>            Status: new
>>>>>       Ticket<URL:
>>>>> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>>
>>>>>
>>>>> Hi Tom/MET help,
>>>>>
>>>>> Thanks for the fantastically quick reply, Tom!
>>>>>
>>>>> It turns out that I'm specifically referring to the netcdf
output from the pb2nc program.
>>>>>
>>>>> I already sent a help ticket to the MET team, asking if they
have a means for removing the duplicate obs from their PB2NC process.
At the time, they didn't refer to the history of data as it undergoes
QC, so this might help me track down the reason for the duplicate obs.
So, I have CC'd the met_help to this email.
>>>>>
>>>>> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>>>>>
>>>>>>        From what it sounds like, I need to better understand
what the "happy medium" should be in setting the quality_mark_thresh
flag in pb2nc.  2 is too restrictive, while 9 appears to be allowing
duplicate observations into the mix as a result of the QC process.
>>>>>
>>>>> Any recommendations are greatly welcome!
>>>>>
>>>>> Thanks much,
>>>>> Jonathan
>>>>>
>>>>>
>>>>> From: Thomas Cram [mailto:tcram at ucar.edu]
>>>>> Sent: Friday, January 13, 2012 4:39 PM
>>>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>>>> Subject: Re: Help with ds337.0
>>>>>
>>>>> Hi Jonathan,
>>>>>
>>>>> the only experience I have working with the MET software is
using the pb2nc utility to convert PREPBUFR observations into a NetCDF
dataset, so my knowledge of MET is limited.  However, the one reason I
can think of for the duplicate observations is that you're seeing the
same observation after several stages of quality-control pre-
processing.  The PREPBUFR files contain a complete history of the data
as it's modified during QC, so each station will have multiple reports
at a single time.  There's a quality control flag appended to each
PREPBUFR message; you want to keep the observation with the lowest QC
number.
>>>>>
>>>>> Can you send me the date and time for the examples you list
below?  I'll take a look at the PREPBUFR messages and see if this is
the case.
>>>>>
>>>>> If this doesn't explain it, then I'll forward your question on
to MET support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>>>>>
>>>>> - Tom
>>>>>
>>>>> On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO
INC] wrote:
>>>>>
>>>>>
>>>>> Dear Thomas,
>>>>>
>>>>> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>>>>> I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>>>>
>>>>> Now here is the interesting part:  When I examined the textual
difference files generated by the MET software, I noticed that there
were several stations with "duplicate" observations that led to
duplicate forecast-observation difference pairs.  I put duplicate in
quotes because the observed values were not necessarily the same but
usually very close to one another.
>>>>> The duplicate observations arose from the fact that at the same
observation location, there would be a 5-digit WMO identifier as well
as a 4-digit text station ID at a given hour.
>>>>> I stumbled on these duplicate station data when I made a table
of stations and mapped them, revealing the duplicates.
>>>>>
>>>>> Some examples I stumbled on include:
>>>>> *         78720/MHTG (both at 14.05N, -87.22E)
>>>>> *         78641/MGGT (both at 14.58N, -90.52E)
>>>>> *         78711/MHPL (both at 15.22N, -83.80E)
>>>>> *         78708/MHLM (both at 15.45N, -87.93)
>>>>>
>>>>> There are others, but I thought I'd provide a few examples to
start.
>>>>>
>>>>> If the source of the duplicates is NCEP/EMC, I wonder if it
would be helpful to send them a note as well?
>>>>>
>>>>> Let me know how you would like to proceed.
>>>>>
>>>>> Most sincerely,
>>>>> Jonathan
>>>>>
>>>>>
-------------------------------------------------------------------
>>>>> -
>>>>> --
>>>>> ----------------------------
>>>>> Jonathan Case, ENSCO Inc.
>>>>> NASA Short-term Prediction Research and Transition Center (aka
>>>>> SPoRT
>>>>> Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
>>>>> Voice: 256.961.7504
>>>>> Fax: 256.961.7788
>>>>> Emails: Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-
1 at nasa.gov>      /
>>>>> case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>>>>
-------------------------------------------------------------------
>>>>> -
>>>>> --
>>>>> ----------------------------
>>>>>
>>>>> "Whether the weather is cold, or whether the weather is hot,
we'll weather
>>>>>        the weather whether we like it or not!"
>>>>>
>>>>>
>>>>> Thomas Cram
>>>>> NCAR / CISL / DSS
>>>>> 303-497-1217
>>>>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>


------------------------------------------------
Subject: RE: Help with ds337.0
From: Paul Oldenburg
Time: Thu Mar 01 14:15:05 2012

VERSION MODEL FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD
OBS_VALID_BEG   OBS_VALID_END   FCST_VAR FCST_LEV OBS_VAR OBS_LEV
OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH
COV_THRESH ALPHA LINE_TYPE TOTAL INDEX OBS_SID OBS_LAT  OBS_LON
OBS_LVL    OBS_ELV FCST    OBS     CLIMO
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_113000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    1     PACF1   30.15000 -85.67000 1015.90002 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113600 20120214_113600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    2     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_114200 20120214_114200 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    3     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_114800 20120214_114800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    4     PACF1   30.15000 -85.67000 1016.20001 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_115400 20120214_115400 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    5     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 5.68858 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120000 20120214_120000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    6     PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 5.70000 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120600 20120214_120600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    7     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_121200 20120214_121200 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    8     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_121800 20120214_121800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    9     PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_122400 20120214_122400 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    10    PACF1   30.15000 -85.67000 1016.20001 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_123000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    11    PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA

------------------------------------------------
Subject: RE: Help with ds337.0
From: Paul Oldenburg
Time: Thu Mar 01 14:15:05 2012

VERSION MODEL FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD
OBS_VALID_BEG   OBS_VALID_END   FCST_VAR FCST_LEV OBS_VAR OBS_LEV
OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH
COV_THRESH ALPHA LINE_TYPE TOTAL INDEX OBS_SID OBS_LAT  OBS_LON
OBS_LVL    OBS_ELV FCST    OBS     CLIMO
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_113000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     1     PACF1   30.15000 -85.67000 1015.90002 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113600 20120214_113600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     2     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.70820 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_114800 20120214_114800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     3     PACF1   30.15000 -85.67000 1016.20001 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_115400 20120214_115400 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     4     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 5.68858 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120000 20120214_120000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     5     PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 5.70000 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120600 20120214_120600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     6     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_121800 20120214_121800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     7     PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 6.19839 NA

------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: Thomas Cram
Time: Thu Mar 01 14:57:19 2012

Jonathon, Paul et al.,

Thanks for keeping me in the loop on this issue. It has helped me
understand the prepbufr observations much better!

Best,
- Tom

Sent from my iPhone

On Mar 1, 2012, at 1:15 PM, "Paul Oldenburg via RT"
<met_help at ucar.edu> wrote:

> Jonathan,
>
> See attached.  By the way, I included the relevant output from
point_stat using the -unique flag below.
>
> Paul
>
>
> DEBUG 2: Searching 164 observations from 82 messages.
> DEBUG 3: VxPairDataPoint::add_obs() -> found duplicate observation
for key
> 'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.10_-9999.00' with value
6.7082.
> DEBUG 3: VxPairDataPoint::add_obs() -> found duplicate observation
for key
> 'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.10_-9999.00' with value
6.19839.
> DEBUG 3: VxPairDataPoint::add_obs() -> found duplicate observation
for key
> 'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.20_-9999.00' with value
6.19839.
> DEBUG 3: VxPairDataPoint::add_obs() -> found duplicate observation
for key
> 'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.10_-9999.00' with value
6.19839.
>
>
> On 03/01/2012 02:05 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>> Paul,
>>
>> Can you send me examples of _mpr.txt files?  They are easier to
read than the .stat files.
>>
>> Thanks again!
>> Jon
>>
>> -----Original Message-----
>> From: Paul Oldenburg via RT [mailto:met_help at ucar.edu]
>> Sent: Thursday, March 01, 2012 3:03 PM
>> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
>> Cc: tcram at ucar.edu
>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>
>> Jonathan,
>>
>> Sorry, I should have noticed you were using METv3.0.  The patch
will only work for METv3.1.  You should strongly consider upgrading to
METv3.1 for reasons other than this patch, including a major change in
how gridded data is handled internally by MET.  If you have more
questions about this change, I'll refer you to John.
>>
>> Regarding the handling of duplicate observations, the patched code
will discard all observations that qualify as duplicates, keeping only
a single one.  I attached two point_stat output files that include
matched pairs showing the effect of the unique flag in the patched
code.  I hope this answers your question.
>>
>> Paul
>>
>>
>> On 03/01/2012 01:55 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>
>>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>
>>> Paul,
>>>
>>> Thanks for the patch.  I believe that I'm running MET v3.0, not
3.1.  Hopefully the source files in your patch might work with our
older version.  If not, then I'll need to fully upgrade my config
files and such before I can test out the patch, so you may not hear
from me right away.
>>>
>>> One follow-on question I have is whether the patch will reject ALL
duplicate obs since I've seen 3 duplicate obs in some instances within
the 20-minute time window used in point_stat?
>>>
>>> Thanks again,
>>> Jonathan
>>>
>>> -----Original Message-----
>>> From: Paul Oldenburg via RT [mailto:met_help at ucar.edu]
>>> Sent: Thursday, March 01, 2012 2:50 PM
>>> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
>>> Cc: tcram at ucar.edu
>>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>>
>>> Jonathan,
>>>
>>> I developed a patch for handling point observations in MET that
will
>>> allow the user to optionally throw out duplicate observations.
The
>>> process that I implemented does not use the observation with the
>>> timestamp closest to the forecast valid time as you suggested,
because
>>> of complications in how the code handles obs.  Then, when I
thought
>>> about this, it occurred to me that it doesn't matter because the
>>> observation is a duplicate anyway.  (right?)
>>>
>>> A duplicate observation is defined as an observation with
identical message type, station id, grib code (forecast field),
latitude, longitude, level, height and observed value to an
observation that has already been processed.
>>>
>>> Please deploy the latest MET patches and then the attached patch:
>>>
>>> 1. Deploy latest METv3.1 patches from
>>>
http://www.dtcenter.org/met/users/support/known_issues/METv3.1/index.p
>>> hp 2. Save attached tarball to base MET directory 3. Untar it,
which
>>> should overwrite four source files 4. Run 'make clean' and then
'make'
>>>
>>> When this is complete, you should notice a new command-line option
for point_stat: -unique.  When you use this, point_stat should detect
and throw out duplicate observations.  If your verbosity level is set
to 3 or higher, it will report which observations are being thrown
out.  Please test this and let me know if you have any trouble or if
it does not work in the way that you expected.
>>>
>>> Thanks,
>>>
>>> Paul
>>>
>>>
>>> On 02/13/2012 03:10 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>>
>>>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>
>>>> Hello John/Tim/Methelp,
>>>>
>>>> I finally got back into looking at this issue with duplicate obs
showing up in the PB2NC output, resulting in duplicate fcst-obs pairs
being processed by point_stat.
>>>>
>>>> I found a single obs site in Central America that is generating a
problem on 1 Oct 2011 at 12z (stid "MSSS").
>>>> I processed ONLY this obs through pb2nc to see what the result is
in the netcdf file.
>>>>
>>>> Here is what I see: (from an ncdump of the netcdf file) .
>>>> .
>>>> .
>>>> .
>>>> data:
>>>>
>>>>    obs_arr =
>>>>     0, 33, 942.5, -9999, 0,
>>>>     0, 34, 942.5, -9999, -1,
>>>>     0, 32, 942.5, -9999, 1,
>>>>     1, 51, 942.5, 619.9501, 0.016576,
>>>>     1, 11, 942.5, 619.9501, 297.15,
>>>>     1, 17, 942.5, 619.9501, 293.9545,
>>>>     1, 2, 942.5, 619.9501, 101349.7,
>>>>     2, 33, 942.5, -9999, 0,
>>>>     2, 34, 942.5, -9999, -1,
>>>>     2, 32, 942.5, -9999, 1,
>>>>     3, 51, 942.5, 619.9501, 0.016576,
>>>>     3, 11, 942.5, 619.9501, 297.15,
>>>>     3, 17, 942.5, 619.9501, 293.9545,
>>>>     3, 2, 942.5, 619.9501, 101349.7 ;
>>>>
>>>>    hdr_typ =
>>>>     "ADPSFC",
>>>>     "ADPSFC",
>>>>     "ADPSFC",
>>>>     "ADPSFC" ;
>>>>
>>>>    hdr_sid =
>>>>     "MSSS",
>>>>     "MSSS",
>>>>     "MSSS",
>>>>     "MSSS" ;
>>>>
>>>>    hdr_vld =
>>>>     "20111001_115000",
>>>>     "20111001_115000",
>>>>     "20111001_115501",
>>>>     "20111001_115501" ;
>>>>
>>>>    hdr_arr =
>>>>     13.7, -89.12, 621,
>>>>     13.7, -89.12, 621,
>>>>     13.7, -89.12, 621,
>>>>     13.7, -89.12, 621 ;
>>>> }
>>>>
>>>> So, from what I can tell, the station is reporting the same obs
at 2
>>>> different times,
>>>> 1150(00) UTC and 1155(01) UTC.  Do you have any recommendation on
how I can retain only one of these obs, preferably the one closest to
the top of the hour?  I know I could dramatically narrow down the time
window (e.g. +/- 5 min), but I suspect this would likely miss out on
most observations that report about 10 minutes before the hour.
>>>>
>>>> I value your feedback on this matter.
>>>> Sincerely,
>>>> Jonathan
>>>>
>>>> -----Original Message-----
>>>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>>>> Sent: Thursday, January 19, 2012 1:40 PM
>>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>>> Cc: tcram at ucar.edu
>>>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>>>
>>>> Jonathan,
>>>>
>>>> OK, I reran my analysis using the setting you suggested:
>>>>       in_report_type[] = [ 512, 522, 531, 540, 562 ];
>>>>
>>>> Here's what I see:
>>>>
>>>>      - For qm=2, there are 29 locations, all with unique header
information, and all with unique lat/lons.
>>>>        It looks like the station id's are all alphabetical.  So
the "in_report_type" setting has filtered out the numeric station
id's.
>>>>
>>>>      - For qm=9, there are 57 locations - but only 29 of them
have unique header information!
>>>>
>>>> So I'll need to look more closely at what PB2NC is doing here.
It looks like setting qm=9 really is causing duplicate observations to
be retained.
>>>>
>>>> When I get a chance, I run it through the debugger to
investigate.
>>>>
>>>> Thanks,
>>>> John
>>>>
>>>>
>>>> On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>>>
>>>>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>>
>>>>> John,
>>>>>
>>>>> I noticed that even with specifying the input report types,
there are still a few duplicate observations in the final netcdf
dataset.
>>>>> So, I'm seeing the same thing as in your analysis.
>>>>>
>>>>> -Jonathan
>>>>>
>>>>> -----Original Message-----
>>>>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>>>>> Sent: Thursday, January 19, 2012 1:01 PM
>>>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>>>> Cc: tcram at ucar.edu
>>>>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>>>>
>>>>> Jonathan,
>>>>>
>>>>> I apologize for the long delay in getting back to you on this.
We've been scrambling over the last couple of weeks to finish up
development on a new release.  Here's my recollection of what's going
on with this issue:
>>>>>
>>>>>       - You're using the GDAS PrepBUFR observation dataset, but
you're finding that PB2NC retains very few ADPSFC observations when
you a quality marker of 2.
>>>>>       - We advised via MET-Help that the algorithm employed by
NCEP in the GDAS processing sets most ADPSFC observations' quality
marker to a value of 9.  NCEP does that to prevent those observations
from being used in the data assimilation.  So the use of quality
marker = 9 is more an artifact of the data assimilation process and
not really saying anything about the quality of those observations.
>>>>>       - When you switch to using a quality marker = 9 in PB2NC,
you got many matches, but ended up with more "duplicate" observations.
>>>>>
>>>>> So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>>>>>
>>>>> I did some investigation on this issue this morning.  Here's
what I did:
>>>>>
>>>>> - Retrieved this file:
>>>>>
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112
>>>>> /
>>>>> gdas1.t12z.prepbufr.nr
>>>>> - Ran it through PB2NC from message type = ADPSFC, time window =
+/- 0 seconds, and quality markers of 2 and 9.
>>>>> - For both, I used the updated version of the plot_point_obs
tool to create a plot of the data and dump header information about
the points being plotted.
>>>>> - I also used the -dump option for PB2NC to dump all of the
ADPSFC observations to ASCII format.
>>>>>
>>>>> I've attached several things to this message:
>>>>> - The postscript output of plot_point_obs for qm = 2 and qm =
9m, after first converting to png format.
>>>>> - The output from the plot_point_obs tool for both runs.
>>>>>
>>>>> For qm=2, there were 51 locations plotted in your domain.
>>>>>       - Of those 51...
>>>>>          - All 51 header entries are unique.
>>>>>          - There are only 36 unique combinations of lat/lon.
>>>>> For qm=9, there were 101 locations plotted in your domain.
>>>>>       - Of those 101...
>>>>>          - There are only 52 unique header entries.
>>>>>          - There are only 37 unique combinations of lat/lon.
>>>>>
>>>>> I think there are two issues occurring here:
>>>>>
>>>>> (1) When using qm=2, you'll often see two observing locations
that look the same except for the station ID.  For example:
>>>>>      [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>>>      [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>>>>>
>>>>> I looked at the observations that correspond to these and found
that they do actually differ slightly.
>>>>>
>>>>> (2) The second, larger issue here is when using qm=9.  It does
appear that we're really getting duplicate observations.  Foe example:
>>>>>      [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>>>      [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>>>
>>>>> This will likely require further debugging of the PB2NC tool to
figure out what's going on.
>>>>>
>>>>> I just wanted to let you know what I've found so far.
>>>>>
>>>>> Thanks,
>>>>> John Halley Gotway
>>>>>
>>>>>
>>>>>
>>>>> On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>>>>
>>>>>> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
>>>>>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>>>>>            Queue: met_help
>>>>>>          Subject: RE: Help with ds337.0
>>>>>>            Owner: Nobody
>>>>>>       Requestors: jonathan.case-1 at nasa.gov
>>>>>>           Status: new
>>>>>>      Ticket<URL:
>>>>>> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>>>
>>>>>>
>>>>>> Hi Tom/MET help,
>>>>>>
>>>>>> Thanks for the fantastically quick reply, Tom!
>>>>>>
>>>>>> It turns out that I'm specifically referring to the netcdf
output from the pb2nc program.
>>>>>>
>>>>>> I already sent a help ticket to the MET team, asking if they
have a means for removing the duplicate obs from their PB2NC process.
At the time, they didn't refer to the history of data as it undergoes
QC, so this might help me track down the reason for the duplicate obs.
So, I have CC'd the met_help to this email.
>>>>>>
>>>>>> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>>>>>>
>>>>>>>       From what it sounds like, I need to better understand
what the "happy medium" should be in setting the quality_mark_thresh
flag in pb2nc.  2 is too restrictive, while 9 appears to be allowing
duplicate observations into the mix as a result of the QC process.
>>>>>>
>>>>>> Any recommendations are greatly welcome!
>>>>>>
>>>>>> Thanks much,
>>>>>> Jonathan
>>>>>>
>>>>>>
>>>>>> From: Thomas Cram [mailto:tcram at ucar.edu]
>>>>>> Sent: Friday, January 13, 2012 4:39 PM
>>>>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>>>>> Subject: Re: Help with ds337.0
>>>>>>
>>>>>> Hi Jonathan,
>>>>>>
>>>>>> the only experience I have working with the MET software is
using the pb2nc utility to convert PREPBUFR observations into a NetCDF
dataset, so my knowledge of MET is limited.  However, the one reason I
can think of for the duplicate observations is that you're seeing the
same observation after several stages of quality-control pre-
processing.  The PREPBUFR files contain a complete history of the data
as it's modified during QC, so each station will have multiple reports
at a single time.  There's a quality control flag appended to each
PREPBUFR message; you want to keep the observation with the lowest QC
number.
>>>>>>
>>>>>> Can you send me the date and time for the examples you list
below?  I'll take a look at the PREPBUFR messages and see if this is
the case.
>>>>>>
>>>>>> If this doesn't explain it, then I'll forward your question on
to MET support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>>>>>>
>>>>>> - Tom
>>>>>>
>>>>>> On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO
INC] wrote:
>>>>>>
>>>>>>
>>>>>> Dear Thomas,
>>>>>>
>>>>>> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>>>>>> I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>>>>>
>>>>>> Now here is the interesting part:  When I examined the textual
difference files generated by the MET software, I noticed that there
were several stations with "duplicate" observations that led to
duplicate forecast-observation difference pairs.  I put duplicate in
quotes because the observed values were not necessarily the same but
usually very close to one another.
>>>>>> The duplicate observations arose from the fact that at the same
observation location, there would be a 5-digit WMO identifier as well
as a 4-digit text station ID at a given hour.
>>>>>> I stumbled on these duplicate station data when I made a table
of stations and mapped them, revealing the duplicates.
>>>>>>
>>>>>> Some examples I stumbled on include:
>>>>>> *         78720/MHTG (both at 14.05N, -87.22E)
>>>>>> *         78641/MGGT (both at 14.58N, -90.52E)
>>>>>> *         78711/MHPL (both at 15.22N, -83.80E)
>>>>>> *         78708/MHLM (both at 15.45N, -87.93)
>>>>>>
>>>>>> There are others, but I thought I'd provide a few examples to
start.
>>>>>>
>>>>>> If the source of the duplicates is NCEP/EMC, I wonder if it
would be helpful to send them a note as well?
>>>>>>
>>>>>> Let me know how you would like to proceed.
>>>>>>
>>>>>> Most sincerely,
>>>>>> Jonathan
>>>>>>
>>>>>>
-------------------------------------------------------------------
>>>>>> -
>>>>>> --
>>>>>> ----------------------------
>>>>>> Jonathan Case, ENSCO Inc.
>>>>>> NASA Short-term Prediction Research and Transition Center (aka
>>>>>> SPoRT
>>>>>> Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
>>>>>> Voice: 256.961.7504
>>>>>> Fax: 256.961.7788
>>>>>> Emails: Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-
1 at nasa.gov>      /
>>>>>> case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>>>>>
-------------------------------------------------------------------
>>>>>> -
>>>>>> --
>>>>>> ----------------------------
>>>>>>
>>>>>> "Whether the weather is cold, or whether the weather is hot,
we'll weather
>>>>>>       the weather whether we like it or not!"
>>>>>>
>>>>>>
>>>>>> Thomas Cram
>>>>>> NCAR / CISL / DSS
>>>>>> 303-497-1217
>>>>>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
> VERSION MODEL FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD
OBS_VALID_BEG   OBS_VALID_END   FCST_VAR FCST_LEV OBS_VAR OBS_LEV
OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH
COV_THRESH ALPHA LINE_TYPE TOTAL INDEX OBS_SID OBS_LAT  OBS_LON
OBS_LVL    OBS_ELV FCST    OBS     CLIMO
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_113000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    1     PACF1   30.15000 -85.67000 1015.90002 NA
6.51961 6.70820 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113600 20120214_113600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    2     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.70820 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_114200 20120214_114200 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    3     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.70820 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_114800 20120214_114800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    4     PACF1   30.15000 -85.67000 1016.20001 NA
6.51961 6.19839 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_115400 20120214_115400 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    5     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 5.68858 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120000 20120214_120000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    6     PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 5.70000 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120600 20120214_120600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    7     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_121200 20120214_121200 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    8     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_121800 20120214_121800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    9     PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 6.19839 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_122400 20120214_122400 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    10    PACF1   30.15000 -85.67000 1016.20001 NA
6.51961 6.19839 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_123000 20120214_123000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       11    11    PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
> VERSION MODEL FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD
OBS_VALID_BEG   OBS_VALID_END   FCST_VAR FCST_LEV OBS_VAR OBS_LEV
OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH
COV_THRESH ALPHA LINE_TYPE TOTAL INDEX OBS_SID OBS_LAT  OBS_LON
OBS_LVL    OBS_ELV FCST    OBS     CLIMO
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113000 20120214_113000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     1     PACF1   30.15000 -85.67000 1015.90002 NA
6.51961 6.70820 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_113600 20120214_113600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     2     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.70820 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_114800 20120214_114800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     3     PACF1   30.15000 -85.67000 1016.20001 NA
6.51961 6.19839 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_115400 20120214_115400 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     4     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 5.68858 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120000 20120214_120000 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     5     PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 5.70000 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_120600 20120214_120600 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     6     PACF1   30.15000 -85.67000 1016.09998 NA
6.51961 6.19839 NA
> V3.1    WRF   120000    20120214_120000 20120214_120000 000000
20120214_121800 20120214_121800 WIND     Z10      WIND    Z10
SFCSHP FULL    UW_MEAN     1           NA          NA         NA
NA    MPR       7     7     PACF1   30.15000 -85.67000 1016.29999 NA
6.51961 6.19839 NA

------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: Case, Jonathan[ENSCO INC]
Time: Sat Mar 03 09:34:45 2012

Paul,

We now have METv3.1 installed with and without the patch (i.e.
separate binaries before and after the -unique patch).  I suppose that
we didn't need to install without the patch because the -unique option
is a run-time option in the new program.

Thankfully, there are virtually no changes between the config files in
v3.0 and v3.1 for PointStat.  So, my customized config files don't
need to be changed.

I'll test soon to see how it works with our duplicate obs.

Thanks once again,
Jonathan

-----Original Message-----
From: Paul Oldenburg via RT [mailto:met_help at ucar.edu]
Sent: Thursday, March 01, 2012 3:03 PM
To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
Cc: tcram at ucar.edu
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0

Jonathan,

Sorry, I should have noticed you were using METv3.0.  The patch will
only work for METv3.1.  You should strongly
consider upgrading to METv3.1 for reasons other than this patch,
including a major change in how gridded data is handled
internally by MET.  If you have more questions about this change, I'll
refer you to John.

Regarding the handling of duplicate observations, the patched code
will discard all observations that qualify as
duplicates, keeping only a single one.  I attached two point_stat
output files that include matched pairs showing the
effect of the unique flag in the patched code.  I hope this answers
your question.

Paul


On 03/01/2012 01:55 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
> Paul,
>
> Thanks for the patch.  I believe that I'm running MET v3.0, not 3.1.
Hopefully the source files in your patch might work with our older
version.  If not, then I'll need to fully upgrade my config files and
such before I can test out the patch, so you may not hear from me
right away.
>
> One follow-on question I have is whether the patch will reject ALL
duplicate obs since I've seen 3 duplicate obs in some instances within
the 20-minute time window used in point_stat?
>
> Thanks again,
> Jonathan
>
> -----Original Message-----
> From: Paul Oldenburg via RT [mailto:met_help at ucar.edu]
> Sent: Thursday, March 01, 2012 2:50 PM
> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
> Cc: tcram at ucar.edu
> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>
> Jonathan,
>
> I developed a patch for handling point observations in MET that will
allow the user to optionally throw out duplicate observations.  The
process that I implemented does not use the observation with the
timestamp closest to the forecast valid time as you suggested, because
of complications in how the code handles obs.  Then, when I thought
about this, it occurred to me that it doesn't matter because the
observation is a duplicate anyway.  (right?)
>
> A duplicate observation is defined as an observation with identical
message type, station id, grib code (forecast field), latitude,
longitude, level, height and observed value to an observation that has
already been processed.
>
> Please deploy the latest MET patches and then the attached patch:
>
> 1. Deploy latest METv3.1 patches from
http://www.dtcenter.org/met/users/support/known_issues/METv3.1/index.php
> 2. Save attached tarball to base MET directory 3. Untar it, which
should overwrite four source files 4. Run 'make clean' and then 'make'
>
> When this is complete, you should notice a new command-line option
for point_stat: -unique.  When you use this, point_stat should detect
and throw out duplicate observations.  If your verbosity level is set
to 3 or higher, it will report which observations are being thrown
out.  Please test this and let me know if you have any trouble or if
it does not work in the way that you expected.
>
> Thanks,
>
> Paul
>
>
> On 02/13/2012 03:10 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>> Hello John/Tim/Methelp,
>>
>> I finally got back into looking at this issue with duplicate obs
showing up in the PB2NC output, resulting in duplicate fcst-obs pairs
being processed by point_stat.
>>
>> I found a single obs site in Central America that is generating a
problem on 1 Oct 2011 at 12z (stid "MSSS").
>> I processed ONLY this obs through pb2nc to see what the result is
in the netcdf file.
>>
>> Here is what I see: (from an ncdump of the netcdf file) .
>> .
>> .
>> .
>> data:
>>
>>    obs_arr =
>>     0, 33, 942.5, -9999, 0,
>>     0, 34, 942.5, -9999, -1,
>>     0, 32, 942.5, -9999, 1,
>>     1, 51, 942.5, 619.9501, 0.016576,
>>     1, 11, 942.5, 619.9501, 297.15,
>>     1, 17, 942.5, 619.9501, 293.9545,
>>     1, 2, 942.5, 619.9501, 101349.7,
>>     2, 33, 942.5, -9999, 0,
>>     2, 34, 942.5, -9999, -1,
>>     2, 32, 942.5, -9999, 1,
>>     3, 51, 942.5, 619.9501, 0.016576,
>>     3, 11, 942.5, 619.9501, 297.15,
>>     3, 17, 942.5, 619.9501, 293.9545,
>>     3, 2, 942.5, 619.9501, 101349.7 ;
>>
>>    hdr_typ =
>>     "ADPSFC",
>>     "ADPSFC",
>>     "ADPSFC",
>>     "ADPSFC" ;
>>
>>    hdr_sid =
>>     "MSSS",
>>     "MSSS",
>>     "MSSS",
>>     "MSSS" ;
>>
>>    hdr_vld =
>>     "20111001_115000",
>>     "20111001_115000",
>>     "20111001_115501",
>>     "20111001_115501" ;
>>
>>    hdr_arr =
>>     13.7, -89.12, 621,
>>     13.7, -89.12, 621,
>>     13.7, -89.12, 621,
>>     13.7, -89.12, 621 ;
>> }
>>
>> So, from what I can tell, the station is reporting the same obs at
2
>> different times,
>> 1150(00) UTC and 1155(01) UTC.  Do you have any recommendation on
how I can retain only one of these obs, preferably the one closest to
the top of the hour?  I know I could dramatically narrow down the time
window (e.g. +/- 5 min), but I suspect this would likely miss out on
most observations that report about 10 minutes before the hour.
>>
>> I value your feedback on this matter.
>> Sincerely,
>> Jonathan
>>
>> -----Original Message-----
>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>> Sent: Thursday, January 19, 2012 1:40 PM
>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>> Cc: tcram at ucar.edu
>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>
>> Jonathan,
>>
>> OK, I reran my analysis using the setting you suggested:
>>       in_report_type[] = [ 512, 522, 531, 540, 562 ];
>>
>> Here's what I see:
>>
>>      - For qm=2, there are 29 locations, all with unique header
information, and all with unique lat/lons.
>>        It looks like the station id's are all alphabetical.  So the
"in_report_type" setting has filtered out the numeric station id's.
>>
>>      - For qm=9, there are 57 locations - but only 29 of them have
unique header information!
>>
>> So I'll need to look more closely at what PB2NC is doing here.  It
looks like setting qm=9 really is causing duplicate observations to be
retained.
>>
>> When I get a chance, I run it through the debugger to investigate.
>>
>> Thanks,
>> John
>>
>>
>> On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>
>>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>
>>> John,
>>>
>>> I noticed that even with specifying the input report types, there
are still a few duplicate observations in the final netcdf dataset.
>>> So, I'm seeing the same thing as in your analysis.
>>>
>>> -Jonathan
>>>
>>> -----Original Message-----
>>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>>> Sent: Thursday, January 19, 2012 1:01 PM
>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>> Cc: tcram at ucar.edu
>>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>>
>>> Jonathan,
>>>
>>> I apologize for the long delay in getting back to you on this.
We've been scrambling over the last couple of weeks to finish up
development on a new release.  Here's my recollection of what's going
on with this issue:
>>>
>>>       - You're using the GDAS PrepBUFR observation dataset, but
you're finding that PB2NC retains very few ADPSFC observations when
you a quality marker of 2.
>>>       - We advised via MET-Help that the algorithm employed by
NCEP in the GDAS processing sets most ADPSFC observations' quality
marker to a value of 9.  NCEP does that to prevent those observations
from being used in the data assimilation.  So the use of quality
marker = 9 is more an artifact of the data assimilation process and
not really saying anything about the quality of those observations.
>>>       - When you switch to using a quality marker = 9 in PB2NC,
you got many matches, but ended up with more "duplicate" observations.
>>>
>>> So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>>>
>>> I did some investigation on this issue this morning.  Here's what
I did:
>>>
>>> - Retrieved this file:
>>>
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.20120112/
>>> gdas1.t12z.prepbufr.nr
>>> - Ran it through PB2NC from message type = ADPSFC, time window =
+/- 0 seconds, and quality markers of 2 and 9.
>>> - For both, I used the updated version of the plot_point_obs tool
to create a plot of the data and dump header information about the
points being plotted.
>>> - I also used the -dump option for PB2NC to dump all of the ADPSFC
observations to ASCII format.
>>>
>>> I've attached several things to this message:
>>> - The postscript output of plot_point_obs for qm = 2 and qm = 9m,
after first converting to png format.
>>> - The output from the plot_point_obs tool for both runs.
>>>
>>> For qm=2, there were 51 locations plotted in your domain.
>>>       - Of those 51...
>>>          - All 51 header entries are unique.
>>>          - There are only 36 unique combinations of lat/lon.
>>> For qm=9, there were 101 locations plotted in your domain.
>>>       - Of those 101...
>>>          - There are only 52 unique header entries.
>>>          - There are only 37 unique combinations of lat/lon.
>>>
>>> I think there are two issues occurring here:
>>>
>>> (1) When using qm=2, you'll often see two observing locations that
look the same except for the station ID.  For example:
>>>      [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>      [ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>>>
>>> I looked at the observations that correspond to these and found
that they do actually differ slightly.
>>>
>>> (2) The second, larger issue here is when using qm=9.  It does
appear that we're really getting duplicate observations.  Foe example:
>>>      [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>      [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>
>>> This will likely require further debugging of the PB2NC tool to
figure out what's going on.
>>>
>>> I just wanted to let you know what I've found so far.
>>>
>>> Thanks,
>>> John Halley Gotway
>>>
>>>
>>>
>>> On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>>
>>>> Fri Jan 13 15:50:08 2012: Request 52626 was acted upon.
>>>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>>>            Queue: met_help
>>>>          Subject: RE: Help with ds337.0
>>>>            Owner: Nobody
>>>>       Requestors: jonathan.case-1 at nasa.gov
>>>>           Status: new
>>>>      Ticket<URL:
>>>> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>
>>>>
>>>> Hi Tom/MET help,
>>>>
>>>> Thanks for the fantastically quick reply, Tom!
>>>>
>>>> It turns out that I'm specifically referring to the netcdf output
from the pb2nc program.
>>>>
>>>> I already sent a help ticket to the MET team, asking if they have
a means for removing the duplicate obs from their PB2NC process.  At
the time, they didn't refer to the history of data as it undergoes QC,
so this might help me track down the reason for the duplicate obs.
So, I have CC'd the met_help to this email.
>>>>
>>>> It turns out that when I initially ran pb2nc with the default
quality control flag set to "2" (i.e. quality_mark_thresh in the
PB2NCConfig_default file), I did not get ANY surface observations in
my final netcdf file over Central America.  Upon email exchanges with
the MET team, it was recommended that I set the quality control flag
to "9" to be able to accept more observations into the netcdf outfile.
>>>>
>>>>>      From what it sounds like, I need to better understand what
the "happy medium" should be in setting the quality_mark_thresh flag
in pb2nc.  2 is too restrictive, while 9 appears to be allowing
duplicate observations into the mix as a result of the QC process.
>>>>
>>>> Any recommendations are greatly welcome!
>>>>
>>>> Thanks much,
>>>> Jonathan
>>>>
>>>>
>>>> From: Thomas Cram [mailto:tcram at ucar.edu]
>>>> Sent: Friday, January 13, 2012 4:39 PM
>>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>>> Subject: Re: Help with ds337.0
>>>>
>>>> Hi Jonathan,
>>>>
>>>> the only experience I have working with the MET software is using
the pb2nc utility to convert PREPBUFR observations into a NetCDF
dataset, so my knowledge of MET is limited.  However, the one reason I
can think of for the duplicate observations is that you're seeing the
same observation after several stages of quality-control pre-
processing.  The PREPBUFR files contain a complete history of the data
as it's modified during QC, so each station will have multiple reports
at a single time.  There's a quality control flag appended to each
PREPBUFR message; you want to keep the observation with the lowest QC
number.
>>>>
>>>> Can you send me the date and time for the examples you list
below?  I'll take a look at the PREPBUFR messages and see if this is
the case.
>>>>
>>>> If this doesn't explain it, then I'll forward your question on to
MET support desk and see if they know the reason for duplicate
observations.  They are intimately familiar with the PREPBUFR obs, so
I'm sure they can help you out.
>>>>
>>>> - Tom
>>>>
>>>> On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-VP61)[ENSCO
INC] wrote:
>>>>
>>>>
>>>> Dear Thomas,
>>>>
>>>> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>>>> I am conducting some weather model verification using the MET
verification software (NCAR's Meteorological Evaluation Tools) and the
NCEP GDAS PREPBUFR point observation files for ground truth.  I have
accessed archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>>>
>>>> Now here is the interesting part:  When I examined the textual
difference files generated by the MET software, I noticed that there
were several stations with "duplicate" observations that led to
duplicate forecast-observation difference pairs.  I put duplicate in
quotes because the observed values were not necessarily the same but
usually very close to one another.
>>>> The duplicate observations arose from the fact that at the same
observation location, there would be a 5-digit WMO identifier as well
as a 4-digit text station ID at a given hour.
>>>> I stumbled on these duplicate station data when I made a table of
stations and mapped them, revealing the duplicates.
>>>>
>>>> Some examples I stumbled on include:
>>>> *         78720/MHTG (both at 14.05N, -87.22E)
>>>> *         78641/MGGT (both at 14.58N, -90.52E)
>>>> *         78711/MHPL (both at 15.22N, -83.80E)
>>>> *         78708/MHLM (both at 15.45N, -87.93)
>>>>
>>>> There are others, but I thought I'd provide a few examples to
start.
>>>>
>>>> If the source of the duplicates is NCEP/EMC, I wonder if it would
be helpful to send them a note as well?
>>>>
>>>> Let me know how you would like to proceed.
>>>>
>>>> Most sincerely,
>>>> Jonathan
>>>>
>>>>
--------------------------------------------------------------------
>>>> --
>>>> ----------------------------
>>>> Jonathan Case, ENSCO Inc.
>>>> NASA Short-term Prediction Research and Transition Center (aka
SPoRT
>>>> Center) 320 Sparkman Drive, Room 3062 Huntsville, AL 35805
>>>> Voice: 256.961.7504
>>>> Fax: 256.961.7788
>>>> Emails: Jonathan.Case-1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>
/
>>>> case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>>>
--------------------------------------------------------------------
>>>> --
>>>> ----------------------------
>>>>
>>>> "Whether the weather is cold, or whether the weather is hot,
we'll weather
>>>>       the weather whether we like it or not!"
>>>>
>>>>
>>>> Thomas Cram
>>>> NCAR / CISL / DSS
>>>> 303-497-1217
>>>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>



------------------------------------------------
Subject: RE: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
From: Case, Jonathan[ENSCO INC]
Time: Mon Mar 05 08:30:57 2012

Hi Paul,

I'm now trying to test out the MET patch for point_stat
with the "-unique" option.  But first, I need to be able simply to run
point_stat normally, which I can't seem to do with the same script as
I had been using with v3.0.1.

I'm receiving the following errors
when trying to run point_stat in v3.1:

POINTSTAT COMMAND LINE:
-----------------------
/raid1/models/nu-wrf_v3beta2-
3.2.1/METv3.1/bin_orig/point_stat
/raid2/casejl/WRF/MESOAMERICA/grib/2011100106/1110010600_arw_d01.grb1f060000
/raid2/casejl/MET/PB2NC/OUTPUT/gdas_sfc_20111001_12_MESOAMERICA.nc
/raid2/casejl/MET/POINTSTAT/PointStatConfig_SFC_MESOAMERICA
-obs_valid_beg 20111001_115000 -obs_valid_end 20111001_120500
-outdir
/raid2/casejl/MET/POINTSTAT/2011100106 -v 3
-----------------------
ERROR  :
ERROR  : timestring_to_unix(const char *) -> can't parse
date/time string "20111001_1150"
ERROR  :

Any ideas what may be
wrong with my command line options or config file?  I get the same
error with the binaries before and after the patch you sent me.
Thanks for your help,
Jonathan

-----Original Message-----
From:
Paul Oldenburg via RT [mailto:met_help at ucar.edu]
Sent: Thursday,
March 01, 2012 3:15 PM
To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
Cc:
tcram at ucar.edu
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with
ds337.0

Jonathan,

See attached.  By the way, I included the
relevant output from point_stat using the -unique flag below.

Paul
DEBUG 2: Searching 164 observations from 82 messages.
DEBUG 3:
VxPairDataPoint::add_obs() -> found duplicate observation for key
'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.10_-9999.00' with value
6.7082.
DEBUG 3: VxPairDataPoint::add_obs() -> found duplicate
observation for key 'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.10_-
9999.00' with value 6.19839.
DEBUG 3: VxPairDataPoint::add_obs() ->
found duplicate observation for key 'SFCSHP_PACF1_0032_30.150000_-
85.669998_1016.20_-9999.00' with value 6.19839.
DEBUG 3:
VxPairDataPoint::add_obs() -> found duplicate observation for key
'SFCSHP_PACF1_0032_30.150000_-85.669998_1016.10_-9999.00' with value
6.19839.


On 03/01/2012 02:05 PM, Case, Jonathan[ENSCO INC] via RT
wrote:
>
> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>
> Paul,
>
> Can you send me examples of _mpr.txt files?  They are easier to
read than the .stat files.
>
> Thanks again!
> Jon
>
>
-----Original Message-----
> From: Paul Oldenburg via RT
[mailto:met_help at ucar.edu]
> Sent: Thursday, March 01, 2012 3:03 PM
> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
> Cc: tcram at ucar.edu
>
Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>
>
Jonathan,
>
> Sorry, I should have noticed you were using METv3.0.
The patch will only work for METv3.1.  You should strongly consider
upgrading to METv3.1 for reasons other than this patch, including a
major change in how gridded data is handled internally by MET.  If you
have more questions about this change, I'll refer you to John.
>
>
Regarding the handling of duplicate observations, the patched code
will discard all observations that qualify as duplicates, keeping only
a single one.  I attached two point_stat output files that include
matched pairs showing the effect of the unique flag in the patched
code.  I hope this answers your question.
>
> Paul
>
>
> On
03/01/2012 01:55 PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>
>>
<URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>
>>
Paul,
>>
>> Thanks for the patch.  I believe that I'm running MET
v3.0, not 3.1.  Hopefully the source files in your patch might work
with our older version.  If not, then I'll need to fully upgrade my
config files and such before I can test out the patch, so you may not
hear from me right away.
>>
>> One follow-on question I have is
whether the patch will reject ALL duplicate obs since I've seen 3
duplicate obs in some instances within the 20-minute time window used
in point_stat?
>>
>> Thanks again,
>> Jonathan
>>
>>
-----Original Message-----
>> From: Paul Oldenburg via RT
[mailto:met_help at ucar.edu]
>> Sent: Thursday, March 01, 2012 2:50 PM
>> To: Case, Jonathan (MSFC-ZP11)[ENSCO INC]
>> Cc: tcram at ucar.edu
>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>
>>
Jonathan,
>>
>> I developed a patch for handling point observations
in MET that will
>> allow the user to optionally throw out duplicate
observations.  The
>> process that I implemented does not use the
observation with the
>> timestamp closest to the forecast valid time
as you suggested,
>> because of complications in how the code handles
obs.  Then, when I
>> thought about this, it occurred to me that it
doesn't matter because
>> the observation is a duplicate anyway.
(right?)
>>
>> A duplicate observation is defined as an observation
with identical message type, station id, grib code (forecast field),
latitude, longitude, level, height and observed value to an
observation that has already been processed.
>>
>> Please deploy the
latest MET patches and then the attached patch:
>>
>> 1. Deploy
latest METv3.1 patches from
>>
http://www.dtcenter.org/met/users/support/known_issues/METv3.1/index.
>> p hp 2. Save attached tarball to base MET directory 3. Untar it,
>> which should overwrite four source files 4. Run 'make clean' and
then
>> 'make'
>>
>> When this is complete, you should notice a new
command-line option for point_stat: -unique.  When you use this,
point_stat should detect and throw out duplicate observations.  If
your verbosity level is set to 3 or higher, it will report which
observations are being thrown out.  Please test this and let me know
if you have any trouble or if it does not work in the way that you
expected.
>>
>> Thanks,
>>
>> Paul
>>
>>
>> On 02/13/2012 03:10
PM, Case, Jonathan[ENSCO INC] via RT wrote:
>>>
>>> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>
>>>
Hello John/Tim/Methelp,
>>>
>>> I finally got back into looking at
this issue with duplicate obs showing up in the PB2NC output,
resulting in duplicate fcst-obs pairs being processed by point_stat.
>>>
>>> I found a single obs site in Central America that is
generating a problem on 1 Oct 2011 at 12z (stid "MSSS").
>>> I
processed ONLY this obs through pb2nc to see what the result is in the
netcdf file.
>>>
>>> Here is what I see: (from an ncdump of the
netcdf file) .
>>> .
>>> .
>>> .
>>> data:
>>>
>>>     obs_arr =
>>>      0, 33, 942.5, -9999, 0,
>>>      0, 34, 942.5, -9999, -1,
>>>      0, 32, 942.5, -9999, 1,
>>>      1, 51, 942.5, 619.9501,
0.016576,
>>>      1, 11, 942.5, 619.9501, 297.15,
>>>      1, 17,
942.5, 619.9501, 293.9545,
>>>      1, 2, 942.5, 619.9501, 101349.7,
>>>      2, 33, 942.5, -9999, 0,
>>>      2, 34, 942.5, -9999, -1,
>>>      2, 32, 942.5, -9999, 1,
>>>      3, 51, 942.5, 619.9501,
0.016576,
>>>      3, 11, 942.5, 619.9501, 297.15,
>>>      3, 17,
942.5, 619.9501, 293.9545,
>>>      3, 2, 942.5, 619.9501, 101349.7 ;
>>>
>>>     hdr_typ =
>>>      "ADPSFC",
>>>      "ADPSFC",
>>>
"ADPSFC",
>>>      "ADPSFC" ;
>>>
>>>     hdr_sid =
>>>
"MSSS",
>>>      "MSSS",
>>>      "MSSS",
>>>      "MSSS" ;
>>>
>>>     hdr_vld =
>>>      "20111001_115000",
>>>
"20111001_115000",
>>>      "20111001_115501",
>>>
"20111001_115501" ;
>>>
>>>     hdr_arr =
>>>      13.7, -89.12,
621,
>>>      13.7, -89.12, 621,
>>>      13.7, -89.12, 621,
>>>
13.7, -89.12, 621 ;
>>> }
>>>
>>> So, from what I can tell, the
station is reporting the same obs at 2
>>> different times,
>>>
1150(00) UTC and 1155(01) UTC.  Do you have any recommendation on how
I can retain only one of these obs, preferably the one closest to the
top of the hour?  I know I could dramatically narrow down the time
window (e.g. +/- 5 min), but I suspect this would likely miss out on
most observations that report about 10 minutes before the hour.
>>>
>>> I value your feedback on this matter.
>>> Sincerely,
>>>
Jonathan
>>>
>>> -----Original Message-----
>>> From: John Halley
Gotway via RT [mailto:met_help at ucar.edu]
>>> Sent: Thursday, January
19, 2012 1:40 PM
>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>>
Cc: tcram at ucar.edu
>>> Subject: Re: [rt.rap.ucar.edu #52626] RE: Help
with ds337.0
>>>
>>> Jonathan,
>>>
>>> OK, I reran my analysis
using the setting you suggested:
>>>        in_report_type[] = [ 512,
522, 531, 540, 562 ];
>>>
>>> Here's what I see:
>>>
>>>       -
For qm=2, there are 29 locations, all with unique header information,
and all with unique lat/lons.
>>>         It looks like the station
id's are all alphabetical.  So the "in_report_type" setting has
filtered out the numeric station id's.
>>>
>>>       - For qm=9,
there are 57 locations - but only 29 of them have unique header
information!
>>>
>>> So I'll need to look more closely at what PB2NC
is doing here.  It looks like setting qm=9 really is causing duplicate
observations to be retained.
>>>
>>> When I get a chance, I run it
through the debugger to investigate.
>>>
>>> Thanks,
>>> John
>>>
>>>
>>> On 01/19/2012 12:33 PM, Case, Jonathan[ENSCO INC] via RT
wrote:
>>>>
>>>> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>
>>>>
John,
>>>>
>>>> I noticed that even with specifying the input report
types, there are still a few duplicate observations in the final
netcdf dataset.
>>>> So, I'm seeing the same thing as in your
analysis.
>>>>
>>>> -Jonathan
>>>>
>>>> -----Original Message-----
>>>> From: John Halley Gotway via RT [mailto:met_help at ucar.edu]
>>>>
Sent: Thursday, January 19, 2012 1:01 PM
>>>> To: Case, Jonathan
(MSFC-VP61)[ENSCO INC]
>>>> Cc: tcram at ucar.edu
>>>> Subject: Re:
[rt.rap.ucar.edu #52626] RE: Help with ds337.0
>>>>
>>>> Jonathan,
>>>>
>>>> I apologize for the long delay in getting back to you on
this.  We've been scrambling over the last couple of weeks to finish
up development on a new release.  Here's my recollection of what's
going on with this issue:
>>>>
>>>>        - You're using the GDAS
PrepBUFR observation dataset, but you're finding that PB2NC retains
very few ADPSFC observations when you a quality marker of 2.
>>>>
- We advised via MET-Help that the algorithm employed by NCEP in the
GDAS processing sets most ADPSFC observations' quality marker to a
value of 9.  NCEP does that to prevent those observations from being
used in the data assimilation.  So the use of quality marker = 9 is
more an artifact of the data assimilation process and not really
saying anything about the quality of those observations.
>>>>
- When you switch to using a quality marker = 9 in PB2NC, you got many
matches, but ended up with more "duplicate" observations.
>>>>
>>>>
So is using a quality marker = 9 in PB2NC causing "duplicate"
observations to be retained?
>>>>
>>>> I did some investigation on
this issue this morning.  Here's what I did:
>>>>
>>>> - Retrieved
this file:
>>>>
http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gdas.2012011
>>>> 2
>>>> /
>>>> gdas1.t12z.prepbufr.nr
>>>> - Ran it through
PB2NC from message type = ADPSFC, time window = +/- 0 seconds, and
quality markers of 2 and 9.
>>>> - For both, I used the updated
version of the plot_point_obs tool to create a plot of the data and
dump header information about the points being plotted.
>>>> - I also
used the -dump option for PB2NC to dump all of the ADPSFC observations
to ASCII format.
>>>>
>>>> I've attached several things to this
message:
>>>> - The postscript output of plot_point_obs for qm = 2
and qm = 9m, after first converting to png format.
>>>> - The output
from the plot_point_obs tool for both runs.
>>>>
>>>> For qm=2,
there were 51 locations plotted in your domain.
>>>>        - Of
those 51...
>>>>           - All 51 header entries are unique.
>>>>
- There are only 36 unique combinations of lat/lon.
>>>> For qm=9,
there were 101 locations plotted in your domain.
>>>>        - Of
those 101...
>>>>           - There are only 52 unique header
entries.
>>>>           - There are only 37 unique combinations of
lat/lon.
>>>>
>>>> I think there are two issues occurring here:
>>>>
>>>> (1) When using qm=2, you'll often see two observing
locations that look the same except for the station ID.  For example:
>>>>       [ ADPSFC, 78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>>
[ ADPSFC, MPTO,  20120112_120000, 9.05, -79.37, 11 ]
>>>>
>>>> I
looked at the observations that correspond to these and found that
they do actually differ slightly.
>>>>
>>>> (2) The second, larger
issue here is when using qm=9.  It does appear that we're really
getting duplicate observations.  Foe example:
>>>>       [ ADPSFC,
78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>>       [ ADPSFC,
78792, 20120112_120000, 9.05, -79.37, 11 ]
>>>>
>>>> This will
likely require further debugging of the PB2NC tool to figure out
what's going on.
>>>>
>>>> I just wanted to let you know what I've
found so far.
>>>>
>>>> Thanks,
>>>> John Halley Gotway
>>>>
>>>>
>>>>
>>>> On 01/13/2012 03:50 PM, Case, Jonathan[ENSCO INC] via RT
wrote:
>>>>>
>>>>> Fri Jan 13 15:50:08 2012: Request 52626 was acted
upon.
>>>>> Transaction: Ticket created by jonathan.case-1 at nasa.gov
>>>>>             Queue: met_help
>>>>>           Subject: RE: Help
with ds337.0
>>>>>             Owner: Nobody
>>>>>
Requestors: jonathan.case-1 at nasa.gov
>>>>>            Status: new
>>>>>       Ticket<URL:
>>>>>
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=52626>
>>>>>
>>>>>
>>>>> Hi Tom/MET help,
>>>>>
>>>>> Thanks for the fantastically
quick reply, Tom!
>>>>>
>>>>> It turns out that I'm specifically
referring to the netcdf output from the pb2nc program.
>>>>>
>>>>> I
already sent a help ticket to the MET team, asking if they have a
means for removing the duplicate obs from their PB2NC process.  At the
time, they didn't refer to the history of data as it undergoes QC, so
this might help me track down the reason for the duplicate obs.  So, I
have CC'd the met_help to this email.
>>>>>
>>>>> It turns out that
when I initially ran pb2nc with the default quality control flag set
to "2" (i.e. quality_mark_thresh in the PB2NCConfig_default file), I
did not get ANY surface observations in my final netcdf file over
Central America.  Upon email exchanges with the MET team, it was
recommended that I set the quality control flag to "9" to be able to
accept more observations into the netcdf outfile.
>>>>>
>>>>>>
>From what it sounds like, I need to better understand what the "happy
medium" should be in setting the quality_mark_thresh flag in pb2nc.  2
is too restrictive, while 9 appears to be allowing duplicate
observations into the mix as a result of the QC process.
>>>>>
>>>>>
Any recommendations are greatly welcome!
>>>>>
>>>>> Thanks much,
>>>>> Jonathan
>>>>>
>>>>>
>>>>> From: Thomas Cram
[mailto:tcram at ucar.edu]
>>>>> Sent: Friday, January 13, 2012 4:39 PM
>>>>> To: Case, Jonathan (MSFC-VP61)[ENSCO INC]
>>>>> Subject: Re:
Help with ds337.0
>>>>>
>>>>> Hi Jonathan,
>>>>>
>>>>> the only
experience I have working with the MET software is using the pb2nc
utility to convert PREPBUFR observations into a NetCDF dataset, so my
knowledge of MET is limited.  However, the one reason I can think of
for the duplicate observations is that you're seeing the same
observation after several stages of quality-control pre-processing.
The PREPBUFR files contain a complete history of the data as it's
modified during QC, so each station will have multiple reports at a
single time.  There's a quality control flag appended to each PREPBUFR
message; you want to keep the observation with the lowest QC number.
>>>>>
>>>>> Can you send me the date and time for the examples you
list below?  I'll take a look at the PREPBUFR messages and see if this
is the case.
>>>>>
>>>>> If this doesn't explain it, then I'll
forward your question on to MET support desk and see if they know the
reason for duplicate observations.  They are intimately familiar with
the PREPBUFR obs, so I'm sure they can help you out.
>>>>>
>>>>> -
Tom
>>>>>
>>>>> On Jan 13, 2012, at 3:16 PM, Case, Jonathan (MSFC-
VP61)[ENSCO INC] wrote:
>>>>>
>>>>>
>>>>> Dear Thomas,
>>>>>
>>>>> This is Jonathan Case of the NASA SPoRT Center
(http://weather.msfc.nasa.gov/sport/) in Huntsville, AL.
>>>>> I am
conducting some weather model verification using the MET verification
software (NCAR's Meteorological Evaluation Tools) and the NCEP GDAS
PREPBUFR point observation files for ground truth.  I have accessed
archived GDAS PREPBUFR files from NCAR's repository at
http://dss.ucar.edu/datasets/ds337.0/ and began producing difference
stats over Central America between the model forecast and observations
obtained from the PREPBUFR files.
>>>>>
>>>>> Now here is the
interesting part:  When I examined the textual difference files
generated by the MET software, I noticed that there were several
stations with "duplicate" observations that led to duplicate forecast-
observation difference pairs.  I put duplicate in quotes because the
observed values were not necessarily the same but usually very close
to one another.
>>>>> The duplicate observations arose from the fact
that at the same observation location, there would be a 5-digit WMO
identifier as well as a 4-digit text station ID at a given hour.
>>>>> I stumbled on these duplicate station data when I made a table
of stations and mapped them, revealing the duplicates.
>>>>>
>>>>>
Some examples I stumbled on include:
>>>>> *         78720/MHTG (both
at 14.05N, -87.22E)
>>>>> *         78641/MGGT (both at 14.58N,
-90.52E)
>>>>> *         78711/MHPL (both at 15.22N, -83.80E)
>>>>>
*         78708/MHLM (both at 15.45N, -87.93)
>>>>>
>>>>> There are
others, but I thought I'd provide a few examples to start.
>>>>>
>>>>> If the source of the duplicates is NCEP/EMC, I wonder if it
would be helpful to send them a note as well?
>>>>>
>>>>> Let me
know how you would like to proceed.
>>>>>
>>>>> Most sincerely,
>>>>> Jonathan
>>>>>
>>>>>
------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> --
>>>>> ----------------------------
>>>>>
Jonathan Case, ENSCO Inc.
>>>>> NASA Short-term Prediction Research
and Transition Center (aka
>>>>> SPoRT
>>>>> Center) 320 Sparkman
Drive, Room 3062 Huntsville, AL 35805
>>>>> Voice: 256.961.7504
>>>>> Fax: 256.961.7788
>>>>> Emails: Jonathan.Case-
1 at nasa.gov<mailto:Jonathan.Case-1 at nasa.gov>      /
>>>>>
case.jonathan at ensco.com<mailto:case.jonathan at ensco.com>
>>>>>
------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> --
>>>>> ----------------------------
>>>>>
>>>>> "Whether the weather is cold, or whether the weather is hot,
we'll weather
>>>>>        the weather whether we like it or not!"
>>>>>
>>>>>
>>>>> Thomas Cram
>>>>> NCAR / CISL / DSS
>>>>> 303-
497-1217
>>>>> tcram at ucar.edu<mailto:tcram at ucar.edu>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>

------------------------------------------------


More information about the Met_help mailing list