[Met_help] [rt.rap.ucar.edu #99675] History for Duplicate Observations

John Halley Gotway via RT met_help at ucar.edu
Mon Jul 12 11:24:01 MDT 2021


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

John, a question on duplicates.  I was looking through some of my MPR files for RAOBs at 500mbs and was surprised by the number of duplicates at this level.  I have the unique flag set to TRUE and obs_summary flag set to NEAREST.  The observations were matching on lat, lon, elev, and level but differed on obs time.  I thought the logic in MET was that it would keep the nearest ob to my valid time and dump the rest that had the same lat, lon, level, and elevation,  but that doesn't seem to be happening here.   I attached two files where I ran GREP to limit them to one vx_mask and level for clarification.

Thanks
Bob


----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: Duplicate Observations
From: John Halley Gotway
Time: Wed Apr 21 20:40:01 2021

Bob,

Thanks for sending the sample MPR data. I agree, there are no *true
duplicates* in this output. And by "duplicate" I mean the same lat,
lon,
level, elevation, timestamp, and so on. We'd found in the past that
the
same observation can show up using both the WMO station id (with
numbers)
and non-WMO station id (with letters).  But I don't see that occurring
here.

The total number of MPR lines is 564:
    > cat raob_15mar_00z.txt | wc -l
    > 564

And when we look at the number of unique combinations of obs valid
time,
latitude, longitude, level, and elevation, I also get 564:
   > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30, $31}' |
sort
-u | wc -l
   > 564

And setting the duplicate flag to unique should handle that case if
any
true duplicates were actually present:
   *duplicate_flag = UNIQUE;*

The next thing to consider is the same station reporting multiple
times
within the time observation window. At first glance, I do find the
same
station id showing up multiple times in the output. For example,
station id
"56571" shows up twice, once at 20210314_233000 and a second time 30
minutes later at 20210315_000000.

But careful inspection reveals that the lat/lon location for this
station
ID change!
   > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' | grep 56571
   > 56571 27.9 102.27
   > 56571 27.88 102.3

I assume that 56671 is the id of a ballon that's moving in time. That
would
explain different lat/lon locations at different times.

But this doesn't explain it all. When I look at the unique station
names,
latitude, and longitude, I find 538 unique entries in the 564 lines.
So
there really must be some stations reporting multiple times. The id
50527
is one such example. It appears in 2 MPR lines:

V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA FULL
BILIN
4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537 5120.60004
5119
2 5288 NA NA
V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA FULL
BILIN
4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611 5120.60004
5110
2 5288 NA NA

The observations timestamps change: 20210314_233000 vs 20210315_000000
The station id, lat, lon, and level are the same: 50527 49.25 119.7
500
But notice that the next column, for observation elevation (OBS_ELV),
does
differ: 5113.00537 vs 5104.01611

All 4 of those values are included in the uniqueness key. Here's a
code
snippet from the file named pair_base.cc in MET:

   //  build a uniqueness test key
   string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
                       lat,         //  lat
                       lon,         //  lon
                       lvl,         //  level
                       elv).text(); //  elevation

Because the elevation values differ, these 2 lines are not considered
to be
the same station reporting multiple times in the time window. So
that's the
explanation for this behavior.

So what should we do about it? Generally, the elevation is meant as
the
STATION ELEVATION. In this case it looks like it has the current
height of
the ballon instead of the elevation. If you were to modify the obs to
just
set the elevation value to NA, then the results would be what you
expect.

Hope that helps clarify.

John






On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
<met_help at ucar.edu>
wrote:

>
> Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> Transaction: Given to johnhg (John Halley Gotway) by mccabe
>        Queue: met_help
>      Subject: Duplicate Observations
>        Owner: johnhg
>   Requestors: robert.craig.2 at us.af.mil
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
>
> This transaction appears to have no content
>

------------------------------------------------
Subject: RE: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate Observations
From: robert.craig.2 at us.af.mil
Time: Thu Apr 22 08:01:32 2021

John, these data come from PrepBufr observations.  Is there an easy
way to change the station elevation to NA in PB2NC or Point Stat or do
I have to go into the NetCDF files and change the elevation there
before verification.  Maybe this would be a useful PB2NC option.

Thanks
Bob

-----Original Message-----
From: John Halley Gotway via RT <met_help at ucar.edu>
Sent: Wednesday, April 21, 2021 9:40 PM
To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
Observations

Bob,

Thanks for sending the sample MPR data. I agree, there are no *true
duplicates* in this output. And by "duplicate" I mean the same lat,
lon, level, elevation, timestamp, and so on. We'd found in the past
that the same observation can show up using both the WMO station id
(with numbers) and non-WMO station id (with letters).  But I don't see
that occurring here.

The total number of MPR lines is 564:
    > cat raob_15mar_00z.txt | wc -l
    > 564

And when we look at the number of unique combinations of obs valid
time, latitude, longitude, level, and elevation, I also get 564:
   > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30, $31}' |
sort -u | wc -l
   > 564

And setting the duplicate flag to unique should handle that case if
any true duplicates were actually present:
   *duplicate_flag = UNIQUE;*

The next thing to consider is the same station reporting multiple
times within the time observation window. At first glance, I do find
the same station id showing up multiple times in the output. For
example, station id "56571" shows up twice, once at 20210314_233000
and a second time 30 minutes later at 20210315_000000.

But careful inspection reveals that the lat/lon location for this
station ID change!
   > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' | grep 56571
   > 56571 27.9 102.27
   > 56571 27.88 102.3

I assume that 56671 is the id of a ballon that's moving in time. That
would explain different lat/lon locations at different times.

But this doesn't explain it all. When I look at the unique station
names, latitude, and longitude, I find 538 unique entries in the 564
lines. So there really must be some stations reporting multiple times.
The id 50527 is one such example. It appears in 2 MPR lines:

V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA FULL
BILIN
4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537 5120.60004
5119
2 5288 NA NA
V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA FULL
BILIN
4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611 5120.60004
5110
2 5288 NA NA

The observations timestamps change: 20210314_233000 vs 20210315_000000
The station id, lat, lon, and level are the same: 50527 49.25 119.7
500 But notice that the next column, for observation elevation
(OBS_ELV), does
differ: 5113.00537 vs 5104.01611

All 4 of those values are included in the uniqueness key. Here's a
code snippet from the file named pair_base.cc in MET:

   //  build a uniqueness test key
   string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
                       lat,         //  lat
                       lon,         //  lon
                       lvl,         //  level
                       elv).text(); //  elevation

Because the elevation values differ, these 2 lines are not considered
to be the same station reporting multiple times in the time window. So
that's the explanation for this behavior.

So what should we do about it? Generally, the elevation is meant as
the STATION ELEVATION. In this case it looks like it has the current
height of the ballon instead of the elevation. If you were to modify
the obs to just set the elevation value to NA, then the results would
be what you expect.

Hope that helps clarify.

John






On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
<met_help at ucar.edu>
wrote:

>
> Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> Transaction: Given to johnhg (John Halley Gotway) by mccabe
>        Queue: met_help
>      Subject: Duplicate Observations
>        Owner: johnhg
>   Requestors: robert.craig.2 at us.af.mil
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> >
>
>
> This transaction appears to have no content
>



------------------------------------------------
Subject: Duplicate Observations
From: John Halley Gotway
Time: Thu Apr 22 08:46:32 2021

Bob,

Ah, OK, I wasn’t sure on the source of these obs. We certainly
wouldn’t
expect users to change PrepBufr obs!

Rather than tweaking the format of the data, we could instead make the
construction of that uniqueness key a configurable option.

But I think I should refer this to the scientists in our group for
their
input. I’ll let you know what I find out.

John

On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
> John, these data come from PrepBufr observations.  Is there an easy
way to
> change the station elevation to NA in PB2NC or Point Stat or do I
have to
> go into the NetCDF files and change the elevation there before
> verification.  Maybe this would be a useful PB2NC option.
>
> Thanks
> Bob
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Wednesday, April 21, 2021 9:40 PM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
> Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> Observations
>
> Bob,
>
> Thanks for sending the sample MPR data. I agree, there are no *true
> duplicates* in this output. And by "duplicate" I mean the same lat,
lon,
> level, elevation, timestamp, and so on. We'd found in the past that
the
> same observation can show up using both the WMO station id (with
numbers)
> and non-WMO station id (with letters).  But I don't see that
occurring here.
>
> The total number of MPR lines is 564:
>     > cat raob_15mar_00z.txt | wc -l
>     > 564
>
> And when we look at the number of unique combinations of obs valid
time,
> latitude, longitude, level, and elevation, I also get 564:
>    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30, $31}' |
sort
> -u | wc -l
>    > 564
>
> And setting the duplicate flag to unique should handle that case if
any
> true duplicates were actually present:
>    *duplicate_flag = UNIQUE;*
>
> The next thing to consider is the same station reporting multiple
times
> within the time observation window. At first glance, I do find the
same
> station id showing up multiple times in the output. For example,
station id
> "56571" shows up twice, once at 20210314_233000 and a second time 30
> minutes later at 20210315_000000.
>
> But careful inspection reveals that the lat/lon location for this
station
> ID change!
>    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' | grep
56571
>    > 56571 27.9 102.27
>    > 56571 27.88 102.3
>
> I assume that 56671 is the id of a ballon that's moving in time.
That
> would explain different lat/lon locations at different times.
>
> But this doesn't explain it all. When I look at the unique station
names,
> latitude, and longitude, I find 538 unique entries in the 564 lines.
So
> there really must be some stations reporting multiple times. The id
50527
> is one such example. It appears in 2 MPR lines:
>
> V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA FULL
BILIN
> 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
5120.60004 5119
> 2 5288 NA NA
> V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA FULL
BILIN
> 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
5120.60004 5110
> 2 5288 NA NA
>
> The observations timestamps change: 20210314_233000 vs
20210315_000000 The
> station id, lat, lon, and level are the same: 50527 49.25 119.7 500
But
> notice that the next column, for observation elevation (OBS_ELV),
does
> differ: 5113.00537 vs 5104.01611
>
> All 4 of those values are included in the uniqueness key. Here's a
code
> snippet from the file named pair_base.cc in MET:
>
>    //  build a uniqueness test key
>    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
>                        lat,         //  lat
>                        lon,         //  lon
>                        lvl,         //  level
>                        elv).text(); //  elevation
>
> Because the elevation values differ, these 2 lines are not
considered to
> be the same station reporting multiple times in the time window. So
that's
> the explanation for this behavior.
>
> So what should we do about it? Generally, the elevation is meant as
the
> STATION ELEVATION. In this case it looks like it has the current
height of
> the ballon instead of the elevation. If you were to modify the obs
to just
> set the elevation value to NA, then the results would be what you
expect.
>
> Hope that helps clarify.
>
> John
>
>
>
>
>
>
> On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
<met_help at ucar.edu>
> wrote:
>
> >
> > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> > Transaction: Given to johnhg (John Halley Gotway) by mccabe
> >        Queue: met_help
> >      Subject: Duplicate Observations
> >        Owner: johnhg
> >   Requestors: robert.craig.2 at us.af.mil
> >       Status: new
> >  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> > >
> >
> >
> > This transaction appears to have no content
> >
>
>
>
>

------------------------------------------------
Subject: RE: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate Observations
From: robert.craig.2 at us.af.mil
Time: Thu Apr 22 09:15:05 2021

John, I look forward to your hear what you find out.

Bob

-----Original Message-----
From: John Halley Gotway via RT <met_help at ucar.edu>
Sent: Thursday, April 22, 2021 9:47 AM
To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
Observations

Bob,

Ah, OK, I wasn’t sure on the source of these obs. We certainly
wouldn’t expect users to change PrepBufr obs!

Rather than tweaking the format of the data, we could instead make the
construction of that uniqueness key a configurable option.

But I think I should refer this to the scientists in our group for
their input. I’ll let you know what I find out.

John

On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
> John, these data come from PrepBufr observations.  Is there an easy
> way to change the station elevation to NA in PB2NC or Point Stat or
do
> I have to go into the NetCDF files and change the elevation there
> before verification.  Maybe this would be a useful PB2NC option.
>
> Thanks
> Bob
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Wednesday, April 21, 2021 9:40 PM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> <robert.craig.2 at us.af.mil>
> Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> Observations
>
> Bob,
>
> Thanks for sending the sample MPR data. I agree, there are no *true
> duplicates* in this output. And by "duplicate" I mean the same lat,
> lon, level, elevation, timestamp, and so on. We'd found in the past
> that the same observation can show up using both the WMO station id
> (with numbers) and non-WMO station id (with letters).  But I don't
see that occurring here.
>
> The total number of MPR lines is 564:
>     > cat raob_15mar_00z.txt | wc -l
>     > 564
>
> And when we look at the number of unique combinations of obs valid
> time, latitude, longitude, level, and elevation, I also get 564:
>    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30, $31}' |
> sort -u | wc -l
>    > 564
>
> And setting the duplicate flag to unique should handle that case if
> any true duplicates were actually present:
>    *duplicate_flag = UNIQUE;*
>
> The next thing to consider is the same station reporting multiple
> times within the time observation window. At first glance, I do find
> the same station id showing up multiple times in the output. For
> example, station id "56571" shows up twice, once at 20210314_233000
> and a second time 30 minutes later at 20210315_000000.
>
> But careful inspection reveals that the lat/lon location for this
> station ID change!
>    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' | grep
56571
>    > 56571 27.9 102.27
>    > 56571 27.88 102.3
>
> I assume that 56671 is the id of a ballon that's moving in time.
That
> would explain different lat/lon locations at different times.
>
> But this doesn't explain it all. When I look at the unique station
> names, latitude, and longitude, I find 538 unique entries in the 564
> lines. So there really must be some stations reporting multiple
times.
> The id 50527 is one such example. It appears in 2 MPR lines:
>
> V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA FULL
> BILIN
> 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
5120.60004
> 5119
> 2 5288 NA NA
> V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA FULL
> BILIN
> 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
5120.60004
> 5110
> 2 5288 NA NA
>
> The observations timestamps change: 20210314_233000 vs
20210315_000000
> The station id, lat, lon, and level are the same: 50527 49.25 119.7
> 500 But notice that the next column, for observation elevation
> (OBS_ELV), does
> differ: 5113.00537 vs 5104.01611
>
> All 4 of those values are included in the uniqueness key. Here's a
> code snippet from the file named pair_base.cc in MET:
>
>    //  build a uniqueness test key
>    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
>                        lat,         //  lat
>                        lon,         //  lon
>                        lvl,         //  level
>                        elv).text(); //  elevation
>
> Because the elevation values differ, these 2 lines are not
considered
> to be the same station reporting multiple times in the time window.
So
> that's the explanation for this behavior.
>
> So what should we do about it? Generally, the elevation is meant as
> the STATION ELEVATION. In this case it looks like it has the current
> height of the ballon instead of the elevation. If you were to modify
> the obs to just set the elevation value to NA, then the results
would be what you expect.
>
> Hope that helps clarify.
>
> John
>
>
>
>
>
>
> On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
> <met_help at ucar.edu>
> wrote:
>
> >
> > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> > Transaction: Given to johnhg (John Halley Gotway) by mccabe
> >        Queue: met_help
> >      Subject: Duplicate Observations
> >        Owner: johnhg
> >   Requestors: robert.craig.2 at us.af.mil
> >       Status: new
> >  Ticket <URL:
> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> > >
> >
> >
> > This transaction appears to have no content
> >
>
>
>
>



------------------------------------------------
Subject: Duplicate Observations
From: John Halley Gotway
Time: Thu Apr 22 11:14:46 2021

Bob,

We discussed this situation at our weekly MET project meeting today.
As a
result, I wrote up this feature request to add a configuration option
to
better support this:
   https://github.com/dtcenter/MET/issues/1762

I also added Jonathan Vigh on this ticket, one of the scientists in
our
group. He'd ideally like to take a look at the full PrepBufr file to
get a
better understanding of what's going on. Perhaps the balloon pops and
records observations on the way down. Are you able to share that with
us?
Or point us to it?

For now, I do notice that the obs are 30 minutes apart in the MPR
data. The
only easy option with the existing code would be using a smaller
obs_window
setting around the forecast valid time. If you have +/- 30 minutes,
you
could try +/- 15 instead.

Tara Jensen has some contacts at the NOAA group who generate this
data, and
can reach out to them with any questions Jonathan has after examining
the
data more closely.

Thanks,
John

On Thu, Apr 22, 2021 at 9:15 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
> John, I look forward to your hear what you find out.
>
> Bob
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Thursday, April 22, 2021 9:47 AM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
> Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> Observations
>
> Bob,
>
> Ah, OK, I wasn’t sure on the source of these obs. We certainly
wouldn’t
> expect users to change PrepBufr obs!
>
> Rather than tweaking the format of the data, we could instead make
the
> construction of that uniqueness key a configurable option.
>
> But I think I should refer this to the scientists in our group for
their
> input. I’ll let you know what I find out.
>
> John
>
> On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> >
> > John, these data come from PrepBufr observations.  Is there an
easy
> > way to change the station elevation to NA in PB2NC or Point Stat
or do
> > I have to go into the NetCDF files and change the elevation there
> > before verification.  Maybe this would be a useful PB2NC option.
> >
> > Thanks
> > Bob
> >
> > -----Original Message-----
> > From: John Halley Gotway via RT <met_help at ucar.edu>
> > Sent: Wednesday, April 21, 2021 9:40 PM
> > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> > <robert.craig.2 at us.af.mil>
> > Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> > Observations
> >
> > Bob,
> >
> > Thanks for sending the sample MPR data. I agree, there are no
*true
> > duplicates* in this output. And by "duplicate" I mean the same
lat,
> > lon, level, elevation, timestamp, and so on. We'd found in the
past
> > that the same observation can show up using both the WMO station
id
> > (with numbers) and non-WMO station id (with letters).  But I don't
see
> that occurring here.
> >
> > The total number of MPR lines is 564:
> >     > cat raob_15mar_00z.txt | wc -l
> >     > 564
> >
> > And when we look at the number of unique combinations of obs valid
> > time, latitude, longitude, level, and elevation, I also get 564:
> >    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30, $31}'
|
> > sort -u | wc -l
> >    > 564
> >
> > And setting the duplicate flag to unique should handle that case
if
> > any true duplicates were actually present:
> >    *duplicate_flag = UNIQUE;*
> >
> > The next thing to consider is the same station reporting multiple
> > times within the time observation window. At first glance, I do
find
> > the same station id showing up multiple times in the output. For
> > example, station id "56571" shows up twice, once at
20210314_233000
> > and a second time 30 minutes later at 20210315_000000.
> >
> > But careful inspection reveals that the lat/lon location for this
> > station ID change!
> >    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' | grep
56571
> >    > 56571 27.9 102.27
> >    > 56571 27.88 102.3
> >
> > I assume that 56671 is the id of a ballon that's moving in time.
That
> > would explain different lat/lon locations at different times.
> >
> > But this doesn't explain it all. When I look at the unique station
> > names, latitude, and longitude, I find 538 unique entries in the
564
> > lines. So there really must be some stations reporting multiple
times.
> > The id 50527 is one such example. It appears in 2 MPR lines:
> >
> > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> > 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA
FULL
> > BILIN
> > 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
5120.60004
> > 5119
> > 2 5288 NA NA
> > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> > 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA
FULL
> > BILIN
> > 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
5120.60004
> > 5110
> > 2 5288 NA NA
> >
> > The observations timestamps change: 20210314_233000 vs
20210315_000000
> > The station id, lat, lon, and level are the same: 50527 49.25
119.7
> > 500 But notice that the next column, for observation elevation
> > (OBS_ELV), does
> > differ: 5113.00537 vs 5104.01611
> >
> > All 4 of those values are included in the uniqueness key. Here's a
> > code snippet from the file named pair_base.cc in MET:
> >
> >    //  build a uniqueness test key
> >    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
> >                        lat,         //  lat
> >                        lon,         //  lon
> >                        lvl,         //  level
> >                        elv).text(); //  elevation
> >
> > Because the elevation values differ, these 2 lines are not
considered
> > to be the same station reporting multiple times in the time
window. So
> > that's the explanation for this behavior.
> >
> > So what should we do about it? Generally, the elevation is meant
as
> > the STATION ELEVATION. In this case it looks like it has the
current
> > height of the ballon instead of the elevation. If you were to
modify
> > the obs to just set the elevation value to NA, then the results
would be
> what you expect.
> >
> > Hope that helps clarify.
> >
> > John
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
> > <met_help at ucar.edu>
> > wrote:
> >
> > >
> > > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> > > Transaction: Given to johnhg (John Halley Gotway) by mccabe
> > >        Queue: met_help
> > >      Subject: Duplicate Observations
> > >        Owner: johnhg
> > >   Requestors: robert.craig.2 at us.af.mil
> > >       Status: new
> > >  Ticket <URL:
> > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> > > >
> > >
> > >
> > > This transaction appears to have no content
> > >
> >
> >
> >
> >
>
>
>
>

------------------------------------------------
Subject: RE: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate Observations
From: robert.craig.2 at us.af.mil
Time: Fri Apr 23 07:03:33 2021

John, I like your idea.  You might ask this as well :  could we be
able to put in a  lat lon difference amount so if two stations fall
within this difference, then they are considered a duplicate and only
the  one meeting the ob_summary criteria is kept.  That would solve
most of the duplicate problem.

Bob

-----Original Message-----
From: John Halley Gotway via RT <met_help at ucar.edu>
Sent: Thursday, April 22, 2021 9:47 AM
To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
Observations

Bob,

Ah, OK, I wasn’t sure on the source of these obs. We certainly
wouldn’t expect users to change PrepBufr obs!

Rather than tweaking the format of the data, we could instead make the
construction of that uniqueness key a configurable option.

But I think I should refer this to the scientists in our group for
their input. I’ll let you know what I find out.

John

On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
> John, these data come from PrepBufr observations.  Is there an easy
> way to change the station elevation to NA in PB2NC or Point Stat or
do
> I have to go into the NetCDF files and change the elevation there
> before verification.  Maybe this would be a useful PB2NC option.
>
> Thanks
> Bob
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Wednesday, April 21, 2021 9:40 PM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> <robert.craig.2 at us.af.mil>
> Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> Observations
>
> Bob,
>
> Thanks for sending the sample MPR data. I agree, there are no *true
> duplicates* in this output. And by "duplicate" I mean the same lat,
> lon, level, elevation, timestamp, and so on. We'd found in the past
> that the same observation can show up using both the WMO station id
> (with numbers) and non-WMO station id (with letters).  But I don't
see that occurring here.
>
> The total number of MPR lines is 564:
>     > cat raob_15mar_00z.txt | wc -l
>     > 564
>
> And when we look at the number of unique combinations of obs valid
> time, latitude, longitude, level, and elevation, I also get 564:
>    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30, $31}' |
> sort -u | wc -l
>    > 564
>
> And setting the duplicate flag to unique should handle that case if
> any true duplicates were actually present:
>    *duplicate_flag = UNIQUE;*
>
> The next thing to consider is the same station reporting multiple
> times within the time observation window. At first glance, I do find
> the same station id showing up multiple times in the output. For
> example, station id "56571" shows up twice, once at 20210314_233000
> and a second time 30 minutes later at 20210315_000000.
>
> But careful inspection reveals that the lat/lon location for this
> station ID change!
>    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' | grep
56571
>    > 56571 27.9 102.27
>    > 56571 27.88 102.3
>
> I assume that 56671 is the id of a ballon that's moving in time.
That
> would explain different lat/lon locations at different times.
>
> But this doesn't explain it all. When I look at the unique station
> names, latitude, and longitude, I find 538 unique entries in the 564
> lines. So there really must be some stations reporting multiple
times.
> The id 50527 is one such example. It appears in 2 MPR lines:
>
> V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA FULL
> BILIN
> 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
5120.60004
> 5119
> 2 5288 NA NA
> V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA FULL
> BILIN
> 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
5120.60004
> 5110
> 2 5288 NA NA
>
> The observations timestamps change: 20210314_233000 vs
20210315_000000
> The station id, lat, lon, and level are the same: 50527 49.25 119.7
> 500 But notice that the next column, for observation elevation
> (OBS_ELV), does
> differ: 5113.00537 vs 5104.01611
>
> All 4 of those values are included in the uniqueness key. Here's a
> code snippet from the file named pair_base.cc in MET:
>
>    //  build a uniqueness test key
>    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
>                        lat,         //  lat
>                        lon,         //  lon
>                        lvl,         //  level
>                        elv).text(); //  elevation
>
> Because the elevation values differ, these 2 lines are not
considered
> to be the same station reporting multiple times in the time window.
So
> that's the explanation for this behavior.
>
> So what should we do about it? Generally, the elevation is meant as
> the STATION ELEVATION. In this case it looks like it has the current
> height of the ballon instead of the elevation. If you were to modify
> the obs to just set the elevation value to NA, then the results
would be what you expect.
>
> Hope that helps clarify.
>
> John
>
>
>
>
>
>
> On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
> <met_help at ucar.edu>
> wrote:
>
> >
> > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> > Transaction: Given to johnhg (John Halley Gotway) by mccabe
> >        Queue: met_help
> >      Subject: Duplicate Observations
> >        Owner: johnhg
> >   Requestors: robert.craig.2 at us.af.mil
> >       Status: new
> >  Ticket <URL:
> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> > >
> >
> >
> > This transaction appears to have no content
> >
>
>
>
>



------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #99675] Duplicate Observations
From: robert.craig.2 at us.af.mil
Time: Fri Apr 23 07:37:04 2021


John, I did set the time to +- 15 minutes and that reduced the problem
considerably.  I will send the files you requested by dod_safe.  I
sent it to jhg at ncar.edu.  If that isn't right, let me know.

Bob

-----Original Message-----
From: John Halley Gotway via RT <met_help at ucar.edu>
Sent: Thursday, April 22, 2021 12:15 PM
To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
Cc: jvigh at ucar.edu
Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
Observations

Bob,

We discussed this situation at our weekly MET project meeting today.
As a result, I wrote up this feature request to add a configuration
option to better support this:
   https://github.com/dtcenter/MET/issues/1762

I also added Jonathan Vigh on this ticket, one of the scientists in
our group. He'd ideally like to take a look at the full PrepBufr file
to get a better understanding of what's going on. Perhaps the balloon
pops and records observations on the way down. Are you able to share
that with us?
Or point us to it?

For now, I do notice that the obs are 30 minutes apart in the MPR
data. The only easy option with the existing code would be using a
smaller obs_window setting around the forecast valid time. If you have
+/- 30 minutes, you could try +/- 15 instead.

Tara Jensen has some contacts at the NOAA group who generate this
data, and can reach out to them with any questions Jonathan has after
examining the data more closely.

Thanks,
John

On Thu, Apr 22, 2021 at 9:15 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
> John, I look forward to your hear what you find out.
>
> Bob
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Thursday, April 22, 2021 9:47 AM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> <robert.craig.2 at us.af.mil>
> Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> Observations
>
> Bob,
>
> Ah, OK, I wasn’t sure on the source of these obs. We certainly
> wouldn’t expect users to change PrepBufr obs!
>
> Rather than tweaking the format of the data, we could instead make
the
> construction of that uniqueness key a configurable option.
>
> But I think I should refer this to the scientists in our group for
> their input. I’ll let you know what I find out.
>
> John
>
> On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> >
> > John, these data come from PrepBufr observations.  Is there an
easy
> > way to change the station elevation to NA in PB2NC or Point Stat
or
> > do I have to go into the NetCDF files and change the elevation
there
> > before verification.  Maybe this would be a useful PB2NC option.
> >
> > Thanks
> > Bob
> >
> > -----Original Message-----
> > From: John Halley Gotway via RT <met_help at ucar.edu>
> > Sent: Wednesday, April 21, 2021 9:40 PM
> > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> > <robert.craig.2 at us.af.mil>
> > Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> > Observations
> >
> > Bob,
> >
> > Thanks for sending the sample MPR data. I agree, there are no
*true
> > duplicates* in this output. And by "duplicate" I mean the same
lat,
> > lon, level, elevation, timestamp, and so on. We'd found in the
past
> > that the same observation can show up using both the WMO station
id
> > (with numbers) and non-WMO station id (with letters).  But I don't
> > see
> that occurring here.
> >
> > The total number of MPR lines is 564:
> >     > cat raob_15mar_00z.txt | wc -l
> >     > 564
> >
> > And when we look at the number of unique combinations of obs valid
> > time, latitude, longitude, level, and elevation, I also get 564:
> >    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30, $31}'
|
> > sort -u | wc -l
> >    > 564
> >
> > And setting the duplicate flag to unique should handle that case
if
> > any true duplicates were actually present:
> >    *duplicate_flag = UNIQUE;*
> >
> > The next thing to consider is the same station reporting multiple
> > times within the time observation window. At first glance, I do
find
> > the same station id showing up multiple times in the output. For
> > example, station id "56571" shows up twice, once at
20210314_233000
> > and a second time 30 minutes later at 20210315_000000.
> >
> > But careful inspection reveals that the lat/lon location for this
> > station ID change!
> >    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' | grep
56571
> >    > 56571 27.9 102.27
> >    > 56571 27.88 102.3
> >
> > I assume that 56671 is the id of a ballon that's moving in time.
> > That would explain different lat/lon locations at different times.
> >
> > But this doesn't explain it all. When I look at the unique station
> > names, latitude, and longitude, I find 538 unique entries in the
564
> > lines. So there really must be some stations reporting multiple
times.
> > The id 50527 is one such example. It appears in 2 MPR lines:
> >
> > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> > 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA
FULL
> > BILIN
> > 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
> > 5120.60004
> > 5119
> > 2 5288 NA NA
> > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> > 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA
FULL
> > BILIN
> > 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
> > 5120.60004
> > 5110
> > 2 5288 NA NA
> >
> > The observations timestamps change: 20210314_233000 vs
> > 20210315_000000 The station id, lat, lon, and level are the same:
> > 50527 49.25 119.7
> > 500 But notice that the next column, for observation elevation
> > (OBS_ELV), does
> > differ: 5113.00537 vs 5104.01611
> >
> > All 4 of those values are included in the uniqueness key. Here's a
> > code snippet from the file named pair_base.cc in MET:
> >
> >    //  build a uniqueness test key
> >    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
> >                        lat,         //  lat
> >                        lon,         //  lon
> >                        lvl,         //  level
> >                        elv).text(); //  elevation
> >
> > Because the elevation values differ, these 2 lines are not
> > considered to be the same station reporting multiple times in the
> > time window. So that's the explanation for this behavior.
> >
> > So what should we do about it? Generally, the elevation is meant
as
> > the STATION ELEVATION. In this case it looks like it has the
current
> > height of the ballon instead of the elevation. If you were to
modify
> > the obs to just set the elevation value to NA, then the results
> > would be
> what you expect.
> >
> > Hope that helps clarify.
> >
> > John
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
> > <met_help at ucar.edu>
> > wrote:
> >
> > >
> > > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> > > Transaction: Given to johnhg (John Halley Gotway) by mccabe
> > >        Queue: met_help
> > >      Subject: Duplicate Observations
> > >        Owner: johnhg
> > >   Requestors: robert.craig.2 at us.af.mil
> > >       Status: new
> > >  Ticket <URL:
> > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> > > >
> > >
> > >
> > > This transaction appears to have no content
> > >
> >
> >
> >
> >
>
>
>
>



------------------------------------------------
Subject: Duplicate Observations
From: John Halley Gotway
Time: Fri Apr 23 13:40:44 2021

Bob,

My email address is "johnhg at ucar.edu" so I won't receive the one sent
to "
jhg at ucar.edu".

Glad to hear that +/- 15 helped reduce the issue. That logic of
checking
for duplicates by searching nearby lat/lon's could get messy and slow
down
the processing a lot. I'm thinking that we could just use the contents
of
the "OBS_SID" column to control this. From this issue:
https://github.com/dtcenter/MET/issues/1762

The default setting for the proposed new config file option would be:

obs_unique_key = [ "OBS_LAT", "OBS_LON", "OBS_LVL", "OBS_ELV" ];

And I think the logic you want could be achieved by changing that
setting
to:

obs_unique_key = [ "OBS_SID", "OBS_LVL" ];

So any observations with the same station id and level would be
grouped
together. And setting "obs_summary = NEAREST" would select only the
single
observation from each sounding at the requested level whose timestamp
is
closest to the valid time of the forecast.

Does that logic make sense?

Thanks,
John

On Fri, Apr 23, 2021 at 7:37 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
>
> John, I did set the time to +- 15 minutes and that reduced the
problem
> considerably.  I will send the files you requested by dod_safe.  I
sent it
> to jhg at ncar.edu.  If that isn't right, let me know.
>
> Bob
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Thursday, April 22, 2021 12:15 PM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
> Cc: jvigh at ucar.edu
> Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> Observations
>
> Bob,
>
> We discussed this situation at our weekly MET project meeting today.
As a
> result, I wrote up this feature request to add a configuration
option to
> better support this:
>    https://github.com/dtcenter/MET/issues/1762
>
> I also added Jonathan Vigh on this ticket, one of the scientists in
our
> group. He'd ideally like to take a look at the full PrepBufr file to
get a
> better understanding of what's going on. Perhaps the balloon pops
and
> records observations on the way down. Are you able to share that
with us?
> Or point us to it?
>
> For now, I do notice that the obs are 30 minutes apart in the MPR
data.
> The only easy option with the existing code would be using a smaller
> obs_window setting around the forecast valid time. If you have +/-
30
> minutes, you could try +/- 15 instead.
>
> Tara Jensen has some contacts at the NOAA group who generate this
data,
> and can reach out to them with any questions Jonathan has after
examining
> the data more closely.
>
> Thanks,
> John
>
> On Thu, Apr 22, 2021 at 9:15 AM robert.craig.2 at us.af.mil via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> >
> > John, I look forward to your hear what you find out.
> >
> > Bob
> >
> > -----Original Message-----
> > From: John Halley Gotway via RT <met_help at ucar.edu>
> > Sent: Thursday, April 22, 2021 9:47 AM
> > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> > <robert.craig.2 at us.af.mil>
> > Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
Duplicate
> > Observations
> >
> > Bob,
> >
> > Ah, OK, I wasn’t sure on the source of these obs. We certainly
> > wouldn’t expect users to change PrepBufr obs!
> >
> > Rather than tweaking the format of the data, we could instead make
the
> > construction of that uniqueness key a configurable option.
> >
> > But I think I should refer this to the scientists in our group for
> > their input. I’ll let you know what I find out.
> >
> > John
> >
> > On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> > >
> > > John, these data come from PrepBufr observations.  Is there an
easy
> > > way to change the station elevation to NA in PB2NC or Point Stat
or
> > > do I have to go into the NetCDF files and change the elevation
there
> > > before verification.  Maybe this would be a useful PB2NC option.
> > >
> > > Thanks
> > > Bob
> > >
> > > -----Original Message-----
> > > From: John Halley Gotway via RT <met_help at ucar.edu>
> > > Sent: Wednesday, April 21, 2021 9:40 PM
> > > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> > > <robert.craig.2 at us.af.mil>
> > > Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> > > Observations
> > >
> > > Bob,
> > >
> > > Thanks for sending the sample MPR data. I agree, there are no
*true
> > > duplicates* in this output. And by "duplicate" I mean the same
lat,
> > > lon, level, elevation, timestamp, and so on. We'd found in the
past
> > > that the same observation can show up using both the WMO station
id
> > > (with numbers) and non-WMO station id (with letters).  But I
don't
> > > see
> > that occurring here.
> > >
> > > The total number of MPR lines is 564:
> > >     > cat raob_15mar_00z.txt | wc -l
> > >     > 564
> > >
> > > And when we look at the number of unique combinations of obs
valid
> > > time, latitude, longitude, level, and elevation, I also get 564:
> > >    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30,
$31}' |
> > > sort -u | wc -l
> > >    > 564
> > >
> > > And setting the duplicate flag to unique should handle that case
if
> > > any true duplicates were actually present:
> > >    *duplicate_flag = UNIQUE;*
> > >
> > > The next thing to consider is the same station reporting
multiple
> > > times within the time observation window. At first glance, I do
find
> > > the same station id showing up multiple times in the output. For
> > > example, station id "56571" shows up twice, once at
20210314_233000
> > > and a second time 30 minutes later at 20210315_000000.
> > >
> > > But careful inspection reveals that the lat/lon location for
this
> > > station ID change!
> > >    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' | grep
56571
> > >    > 56571 27.9 102.27
> > >    > 56571 27.88 102.3
> > >
> > > I assume that 56671 is the id of a ballon that's moving in time.
> > > That would explain different lat/lon locations at different
times.
> > >
> > > But this doesn't explain it all. When I look at the unique
station
> > > names, latitude, and longitude, I find 538 unique entries in the
564
> > > lines. So there really must be some stations reporting multiple
times.
> > > The id 50527 is one such example. It appears in 2 MPR lines:
> > >
> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> > > 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA
FULL
> > > BILIN
> > > 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
> > > 5120.60004
> > > 5119
> > > 2 5288 NA NA
> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> > > 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA
FULL
> > > BILIN
> > > 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
> > > 5120.60004
> > > 5110
> > > 2 5288 NA NA
> > >
> > > The observations timestamps change: 20210314_233000 vs
> > > 20210315_000000 The station id, lat, lon, and level are the
same:
> > > 50527 49.25 119.7
> > > 500 But notice that the next column, for observation elevation
> > > (OBS_ELV), does
> > > differ: 5113.00537 vs 5104.01611
> > >
> > > All 4 of those values are included in the uniqueness key. Here's
a
> > > code snippet from the file named pair_base.cc in MET:
> > >
> > >    //  build a uniqueness test key
> > >    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
> > >                        lat,         //  lat
> > >                        lon,         //  lon
> > >                        lvl,         //  level
> > >                        elv).text(); //  elevation
> > >
> > > Because the elevation values differ, these 2 lines are not
> > > considered to be the same station reporting multiple times in
the
> > > time window. So that's the explanation for this behavior.
> > >
> > > So what should we do about it? Generally, the elevation is meant
as
> > > the STATION ELEVATION. In this case it looks like it has the
current
> > > height of the ballon instead of the elevation. If you were to
modify
> > > the obs to just set the elevation value to NA, then the results
> > > would be
> > what you expect.
> > >
> > > Hope that helps clarify.
> > >
> > > John
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
> > > <met_help at ucar.edu>
> > > wrote:
> > >
> > > >
> > > > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> > > > Transaction: Given to johnhg (John Halley Gotway) by mccabe
> > > >        Queue: met_help
> > > >      Subject: Duplicate Observations
> > > >        Owner: johnhg
> > > >   Requestors: robert.craig.2 at us.af.mil
> > > >       Status: new
> > > >  Ticket <URL:
> > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> > > > >
> > > >
> > > >
> > > > This transaction appears to have no content
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>

------------------------------------------------
Subject: RE: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate Observations
From: robert.craig.2 at us.af.mil
Time: Fri Apr 23 13:46:55 2021

John,
That sounds reasonable.

Bob

-----Original Message-----
From: John Halley Gotway via RT <met_help at ucar.edu>
Sent: Friday, April 23, 2021 2:41 PM
To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
Cc: jvigh at ucar.edu
Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
Observations

Bob,

My email address is "johnhg at ucar.edu" so I won't receive the one sent
to "
jhg at ucar.edu".

Glad to hear that +/- 15 helped reduce the issue. That logic of
checking for duplicates by searching nearby lat/lon's could get messy
and slow down the processing a lot. I'm thinking that we could just
use the contents of the "OBS_SID" column to control this. From this
issue:
https://github.com/dtcenter/MET/issues/1762

The default setting for the proposed new config file option would be:

obs_unique_key = [ "OBS_LAT", "OBS_LON", "OBS_LVL", "OBS_ELV" ];

And I think the logic you want could be achieved by changing that
setting
to:

obs_unique_key = [ "OBS_SID", "OBS_LVL" ];

So any observations with the same station id and level would be
grouped together. And setting "obs_summary = NEAREST" would select
only the single observation from each sounding at the requested level
whose timestamp is closest to the valid time of the forecast.

Does that logic make sense?

Thanks,
John

On Fri, Apr 23, 2021 at 7:37 AM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
>
> John, I did set the time to +- 15 minutes and that reduced the
problem
> considerably.  I will send the files you requested by dod_safe.  I
> sent it to jhg at ncar.edu.  If that isn't right, let me know.
>
> Bob
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Thursday, April 22, 2021 12:15 PM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> <robert.craig.2 at us.af.mil>
> Cc: jvigh at ucar.edu
> Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> Observations
>
> Bob,
>
> We discussed this situation at our weekly MET project meeting today.
> As a result, I wrote up this feature request to add a configuration
> option to better support this:
>    https://github.com/dtcenter/MET/issues/1762
>
> I also added Jonathan Vigh on this ticket, one of the scientists in
> our group. He'd ideally like to take a look at the full PrepBufr
file
> to get a better understanding of what's going on. Perhaps the
balloon
> pops and records observations on the way down. Are you able to share
that with us?
> Or point us to it?
>
> For now, I do notice that the obs are 30 minutes apart in the MPR
data.
> The only easy option with the existing code would be using a smaller
> obs_window setting around the forecast valid time. If you have +/-
30
> minutes, you could try +/- 15 instead.
>
> Tara Jensen has some contacts at the NOAA group who generate this
> data, and can reach out to them with any questions Jonathan has
after
> examining the data more closely.
>
> Thanks,
> John
>
> On Thu, Apr 22, 2021 at 9:15 AM robert.craig.2 at us.af.mil via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> >
> > John, I look forward to your hear what you find out.
> >
> > Bob
> >
> > -----Original Message-----
> > From: John Halley Gotway via RT <met_help at ucar.edu>
> > Sent: Thursday, April 22, 2021 9:47 AM
> > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> > <robert.craig.2 at us.af.mil>
> > Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
Duplicate
> > Observations
> >
> > Bob,
> >
> > Ah, OK, I wasn’t sure on the source of these obs. We certainly
> > wouldn’t expect users to change PrepBufr obs!
> >
> > Rather than tweaking the format of the data, we could instead make
> > the construction of that uniqueness key a configurable option.
> >
> > But I think I should refer this to the scientists in our group for
> > their input. I’ll let you know what I find out.
> >
> > John
> >
> > On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> > >
> > > John, these data come from PrepBufr observations.  Is there an
> > > easy way to change the station elevation to NA in PB2NC or Point
> > > Stat or do I have to go into the NetCDF files and change the
> > > elevation there before verification.  Maybe this would be a
useful PB2NC option.
> > >
> > > Thanks
> > > Bob
> > >
> > > -----Original Message-----
> > > From: John Halley Gotway via RT <met_help at ucar.edu>
> > > Sent: Wednesday, April 21, 2021 9:40 PM
> > > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> > > <robert.craig.2 at us.af.mil>
> > > Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> > > Observations
> > >
> > > Bob,
> > >
> > > Thanks for sending the sample MPR data. I agree, there are no
> > > *true
> > > duplicates* in this output. And by "duplicate" I mean the same
> > > lat, lon, level, elevation, timestamp, and so on. We'd found in
> > > the past that the same observation can show up using both the
WMO
> > > station id (with numbers) and non-WMO station id (with letters).
> > > But I don't see
> > that occurring here.
> > >
> > > The total number of MPR lines is 564:
> > >     > cat raob_15mar_00z.txt | wc -l
> > >     > 564
> > >
> > > And when we look at the number of unique combinations of obs
valid
> > > time, latitude, longitude, level, and elevation, I also get 564:
> > >    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30,
$31}'
> > > | sort -u | wc -l
> > >    > 564
> > >
> > > And setting the duplicate flag to unique should handle that case
> > > if any true duplicates were actually present:
> > >    *duplicate_flag = UNIQUE;*
> > >
> > > The next thing to consider is the same station reporting
multiple
> > > times within the time observation window. At first glance, I do
> > > find the same station id showing up multiple times in the
output.
> > > For example, station id "56571" shows up twice, once at
> > > 20210314_233000 and a second time 30 minutes later at
20210315_000000.
> > >
> > > But careful inspection reveals that the lat/lon location for
this
> > > station ID change!
> > >    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' | grep
56571
> > >    > 56571 27.9 102.27
> > >    > 56571 27.88 102.3
> > >
> > > I assume that 56671 is the id of a ballon that's moving in time.
> > > That would explain different lat/lon locations at different
times.
> > >
> > > But this doesn't explain it all. When I look at the unique
station
> > > names, latitude, and longitude, I find 538 unique entries in the
> > > 564 lines. So there really must be some stations reporting
multiple times.
> > > The id 50527 is one such example. It appears in 2 MPR lines:
> > >
> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> > > 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA
> > > FULL BILIN
> > > 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
> > > 5120.60004
> > > 5119
> > > 2 5288 NA NA
> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> > > 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA
> > > FULL BILIN
> > > 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
> > > 5120.60004
> > > 5110
> > > 2 5288 NA NA
> > >
> > > The observations timestamps change: 20210314_233000 vs
> > > 20210315_000000 The station id, lat, lon, and level are the
same:
> > > 50527 49.25 119.7
> > > 500 But notice that the next column, for observation elevation
> > > (OBS_ELV), does
> > > differ: 5113.00537 vs 5104.01611
> > >
> > > All 4 of those values are included in the uniqueness key. Here's
a
> > > code snippet from the file named pair_base.cc in MET:
> > >
> > >    //  build a uniqueness test key
> > >    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
> > >                        lat,         //  lat
> > >                        lon,         //  lon
> > >                        lvl,         //  level
> > >                        elv).text(); //  elevation
> > >
> > > Because the elevation values differ, these 2 lines are not
> > > considered to be the same station reporting multiple times in
the
> > > time window. So that's the explanation for this behavior.
> > >
> > > So what should we do about it? Generally, the elevation is meant
> > > as the STATION ELEVATION. In this case it looks like it has the
> > > current height of the ballon instead of the elevation. If you
were
> > > to modify the obs to just set the elevation value to NA, then
the
> > > results would be
> > what you expect.
> > >
> > > Hope that helps clarify.
> > >
> > > John
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
> > > <met_help at ucar.edu>
> > > wrote:
> > >
> > > >
> > > > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> > > > Transaction: Given to johnhg (John Halley Gotway) by mccabe
> > > >        Queue: met_help
> > > >      Subject: Duplicate Observations
> > > >        Owner: johnhg
> > > >   Requestors: robert.craig.2 at us.af.mil
> > > >       Status: new
> > > >  Ticket <URL:
> > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> > > > >
> > > >
> > > >
> > > > This transaction appears to have no content
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>



------------------------------------------------
Subject: Duplicate Observations
From: John Halley Gotway
Time: Fri Apr 23 13:54:57 2021

Bob,

Thanks for sending the data files. I was able to pull them down to our
local project machine.

Jonathan, the sample files can be found on kiowa
in: /d1/projects/MET/MET_Help/craig_data_20210423.

2021041800z.pb is a prepbufr file containing all the obs and
2021041800.nc
is the NetCDF output from the pb2nc tool.

I did also run an Rscript to dump the NetCDF obs to ascii which make
them a
bit easier to work with.
   Rscript /usr/local/met-9.1/share/met/Rscripts/pntnc2ascii.R
2021041800.nc
> 2021041800.txt

You can see all the points for the sounding we were discussing by
grepping
for it's ID. Looks like there's 38 levels for TMP there:
   grep " 50527 " 2021041800.txt | grep " TMP " | wc -l
  > 38

Interestingly, the lat/lon remain constant for that one across all 38
levels.

For 56571, there are 18 levels and the lat/lon's change slightly:
   grep " 56571 " 2021041800.txt | grep " TMP " | wc -l
   > 18

Thanks,
John

On Fri, Apr 23, 2021 at 1:40 PM John Halley Gotway <johnhg at ucar.edu>
wrote:

> Bob,
>
> My email address is "johnhg at ucar.edu" so I won't receive the one
sent to "
> jhg at ucar.edu".
>
> Glad to hear that +/- 15 helped reduce the issue. That logic of
checking
> for duplicates by searching nearby lat/lon's could get messy and
slow down
> the processing a lot. I'm thinking that we could just use the
contents of
> the "OBS_SID" column to control this. From this issue:
> https://github.com/dtcenter/MET/issues/1762
>
> The default setting for the proposed new config file option would
be:
>
> obs_unique_key = [ "OBS_LAT", "OBS_LON", "OBS_LVL", "OBS_ELV" ];
>
> And I think the logic you want could be achieved by changing that
setting
> to:
>
> obs_unique_key = [ "OBS_SID", "OBS_LVL" ];
>
> So any observations with the same station id and level would be
grouped
> together. And setting "obs_summary = NEAREST" would select only the
single
> observation from each sounding at the requested level whose
timestamp is
> closest to the valid time of the forecast.
>
> Does that logic make sense?
>
> Thanks,
> John
>
> On Fri, Apr 23, 2021 at 7:37 AM robert.craig.2 at us.af.mil via RT <
> met_help at ucar.edu> wrote:
>
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>>
>>
>> John, I did set the time to +- 15 minutes and that reduced the
problem
>> considerably.  I will send the files you requested by dod_safe.  I
sent it
>> to jhg at ncar.edu.  If that isn't right, let me know.
>>
>> Bob
>>
>> -----Original Message-----
>> From: John Halley Gotway via RT <met_help at ucar.edu>
>> Sent: Thursday, April 22, 2021 12:15 PM
>> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
>> Cc: jvigh at ucar.edu
>> Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
Duplicate
>> Observations
>>
>> Bob,
>>
>> We discussed this situation at our weekly MET project meeting
today. As a
>> result, I wrote up this feature request to add a configuration
option to
>> better support this:
>>    https://github.com/dtcenter/MET/issues/1762
>>
>> I also added Jonathan Vigh on this ticket, one of the scientists in
our
>> group. He'd ideally like to take a look at the full PrepBufr file
to get a
>> better understanding of what's going on. Perhaps the balloon pops
and
>> records observations on the way down. Are you able to share that
with us?
>> Or point us to it?
>>
>> For now, I do notice that the obs are 30 minutes apart in the MPR
data.
>> The only easy option with the existing code would be using a
smaller
>> obs_window setting around the forecast valid time. If you have +/-
30
>> minutes, you could try +/- 15 instead.
>>
>> Tara Jensen has some contacts at the NOAA group who generate this
data,
>> and can reach out to them with any questions Jonathan has after
examining
>> the data more closely.
>>
>> Thanks,
>> John
>>
>> On Thu, Apr 22, 2021 at 9:15 AM robert.craig.2 at us.af.mil via RT <
>> met_help at ucar.edu> wrote:
>>
>> >
>> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>> >
>> > John, I look forward to your hear what you find out.
>> >
>> > Bob
>> >
>> > -----Original Message-----
>> > From: John Halley Gotway via RT <met_help at ucar.edu>
>> > Sent: Thursday, April 22, 2021 9:47 AM
>> > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
>> > <robert.craig.2 at us.af.mil>
>> > Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
Duplicate
>> > Observations
>> >
>> > Bob,
>> >
>> > Ah, OK, I wasn’t sure on the source of these obs. We certainly
>> > wouldn’t expect users to change PrepBufr obs!
>> >
>> > Rather than tweaking the format of the data, we could instead
make the
>> > construction of that uniqueness key a configurable option.
>> >
>> > But I think I should refer this to the scientists in our group
for
>> > their input. I’ll let you know what I find out.
>> >
>> > John
>> >
>> > On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT <
>> > met_help at ucar.edu> wrote:
>> >
>> > >
>> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>> > >
>> > > John, these data come from PrepBufr observations.  Is there an
easy
>> > > way to change the station elevation to NA in PB2NC or Point
Stat or
>> > > do I have to go into the NetCDF files and change the elevation
there
>> > > before verification.  Maybe this would be a useful PB2NC
option.
>> > >
>> > > Thanks
>> > > Bob
>> > >
>> > > -----Original Message-----
>> > > From: John Halley Gotway via RT <met_help at ucar.edu>
>> > > Sent: Wednesday, April 21, 2021 9:40 PM
>> > > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
>> > > <robert.craig.2 at us.af.mil>
>> > > Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
Duplicate
>> > > Observations
>> > >
>> > > Bob,
>> > >
>> > > Thanks for sending the sample MPR data. I agree, there are no
*true
>> > > duplicates* in this output. And by "duplicate" I mean the same
lat,
>> > > lon, level, elevation, timestamp, and so on. We'd found in the
past
>> > > that the same observation can show up using both the WMO
station id
>> > > (with numbers) and non-WMO station id (with letters).  But I
don't
>> > > see
>> > that occurring here.
>> > >
>> > > The total number of MPR lines is 564:
>> > >     > cat raob_15mar_00z.txt | wc -l
>> > >     > 564
>> > >
>> > > And when we look at the number of unique combinations of obs
valid
>> > > time, latitude, longitude, level, and elevation, I also get
564:
>> > >    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30,
$31}' |
>> > > sort -u | wc -l
>> > >    > 564
>> > >
>> > > And setting the duplicate flag to unique should handle that
case if
>> > > any true duplicates were actually present:
>> > >    *duplicate_flag = UNIQUE;*
>> > >
>> > > The next thing to consider is the same station reporting
multiple
>> > > times within the time observation window. At first glance, I do
find
>> > > the same station id showing up multiple times in the output.
For
>> > > example, station id "56571" shows up twice, once at
20210314_233000
>> > > and a second time 30 minutes later at 20210315_000000.
>> > >
>> > > But careful inspection reveals that the lat/lon location for
this
>> > > station ID change!
>> > >    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' |
grep 56571
>> > >    > 56571 27.9 102.27
>> > >    > 56571 27.88 102.3
>> > >
>> > > I assume that 56671 is the id of a ballon that's moving in
time.
>> > > That would explain different lat/lon locations at different
times.
>> > >
>> > > But this doesn't explain it all. When I look at the unique
station
>> > > names, latitude, and longitude, I find 538 unique entries in
the 564
>> > > lines. So there really must be some stations reporting multiple
times.
>> > > The id 50527 is one such example. It appears in 2 MPR lines:
>> > >
>> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
>> > > 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA
FULL
>> > > BILIN
>> > > 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
>> > > 5120.60004
>> > > 5119
>> > > 2 5288 NA NA
>> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
>> > > 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA
FULL
>> > > BILIN
>> > > 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
>> > > 5120.60004
>> > > 5110
>> > > 2 5288 NA NA
>> > >
>> > > The observations timestamps change: 20210314_233000 vs
>> > > 20210315_000000 The station id, lat, lon, and level are the
same:
>> > > 50527 49.25 119.7
>> > > 500 But notice that the next column, for observation elevation
>> > > (OBS_ELV), does
>> > > differ: 5113.00537 vs 5104.01611
>> > >
>> > > All 4 of those values are included in the uniqueness key.
Here's a
>> > > code snippet from the file named pair_base.cc in MET:
>> > >
>> > >    //  build a uniqueness test key
>> > >    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
>> > >                        lat,         //  lat
>> > >                        lon,         //  lon
>> > >                        lvl,         //  level
>> > >                        elv).text(); //  elevation
>> > >
>> > > Because the elevation values differ, these 2 lines are not
>> > > considered to be the same station reporting multiple times in
the
>> > > time window. So that's the explanation for this behavior.
>> > >
>> > > So what should we do about it? Generally, the elevation is
meant as
>> > > the STATION ELEVATION. In this case it looks like it has the
current
>> > > height of the ballon instead of the elevation. If you were to
modify
>> > > the obs to just set the elevation value to NA, then the results
>> > > would be
>> > what you expect.
>> > >
>> > > Hope that helps clarify.
>> > >
>> > > John
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
>> > > <met_help at ucar.edu>
>> > > wrote:
>> > >
>> > > >
>> > > > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
>> > > > Transaction: Given to johnhg (John Halley Gotway) by mccabe
>> > > >        Queue: met_help
>> > > >      Subject: Duplicate Observations
>> > > >        Owner: johnhg
>> > > >   Requestors: robert.craig.2 at us.af.mil
>> > > >       Status: new
>> > > >  Ticket <URL:
>> > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
>> > > > >
>> > > >
>> > > >
>> > > > This transaction appears to have no content
>> > > >
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> >
>>
>>
>>
>>

------------------------------------------------
Subject: RE: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate Observations
From: robert.craig.2 at us.af.mil
Time: Fri Apr 23 14:14:12 2021

This could be caused by the new Radiosondes recording position at
every level instead of the older radiosondes coding the position of
the launch site.

-----Original Message-----
From: John Halley Gotway via RT <met_help at ucar.edu>
Sent: Friday, April 23, 2021 2:55 PM
To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
Cc: jvigh at ucar.edu
Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
Observations

Bob,

Thanks for sending the data files. I was able to pull them down to our
local project machine.

Jonathan, the sample files can be found on kiowa
in: /d1/projects/MET/MET_Help/craig_data_20210423.

2021041800z.pb is a prepbufr file containing all the obs and
2021041800.nc is the NetCDF output from the pb2nc tool.

I did also run an Rscript to dump the NetCDF obs to ascii which make
them a bit easier to work with.
   Rscript /usr/local/met-9.1/share/met/Rscripts/pntnc2ascii.R
2021041800.nc
> 2021041800.txt

You can see all the points for the sounding we were discussing by
grepping for it's ID. Looks like there's 38 levels for TMP there:
   grep " 50527 " 2021041800.txt | grep " TMP " | wc -l
  > 38

Interestingly, the lat/lon remain constant for that one across all 38
levels.

For 56571, there are 18 levels and the lat/lon's change slightly:
   grep " 56571 " 2021041800.txt | grep " TMP " | wc -l
   > 18

Thanks,
John

On Fri, Apr 23, 2021 at 1:40 PM John Halley Gotway <johnhg at ucar.edu>
wrote:

> Bob,
>
> My email address is "johnhg at ucar.edu" so I won't receive the one
sent to "
> jhg at ucar.edu".
>
> Glad to hear that +/- 15 helped reduce the issue. That logic of
> checking for duplicates by searching nearby lat/lon's could get
messy
> and slow down the processing a lot. I'm thinking that we could just
> use the contents of the "OBS_SID" column to control this. From this
issue:
> https://github.com/dtcenter/MET/issues/1762
>
> The default setting for the proposed new config file option would
be:
>
> obs_unique_key = [ "OBS_LAT", "OBS_LON", "OBS_LVL", "OBS_ELV" ];
>
> And I think the logic you want could be achieved by changing that
> setting
> to:
>
> obs_unique_key = [ "OBS_SID", "OBS_LVL" ];
>
> So any observations with the same station id and level would be
> grouped together. And setting "obs_summary = NEAREST" would select
> only the single observation from each sounding at the requested
level
> whose timestamp is closest to the valid time of the forecast.
>
> Does that logic make sense?
>
> Thanks,
> John
>
> On Fri, Apr 23, 2021 at 7:37 AM robert.craig.2 at us.af.mil via RT <
> met_help at ucar.edu> wrote:
>
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>>
>>
>> John, I did set the time to +- 15 minutes and that reduced the
>> problem considerably.  I will send the files you requested by
>> dod_safe.  I sent it to jhg at ncar.edu.  If that isn't right, let me
know.
>>
>> Bob
>>
>> -----Original Message-----
>> From: John Halley Gotway via RT <met_help at ucar.edu>
>> Sent: Thursday, April 22, 2021 12:15 PM
>> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
>> <robert.craig.2 at us.af.mil>
>> Cc: jvigh at ucar.edu
>> Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
Duplicate
>> Observations
>>
>> Bob,
>>
>> We discussed this situation at our weekly MET project meeting
today.
>> As a result, I wrote up this feature request to add a configuration
>> option to better support this:
>>    https://github.com/dtcenter/MET/issues/1762
>>
>> I also added Jonathan Vigh on this ticket, one of the scientists in
>> our group. He'd ideally like to take a look at the full PrepBufr
file
>> to get a better understanding of what's going on. Perhaps the
balloon
>> pops and records observations on the way down. Are you able to
share that with us?
>> Or point us to it?
>>
>> For now, I do notice that the obs are 30 minutes apart in the MPR
data.
>> The only easy option with the existing code would be using a
smaller
>> obs_window setting around the forecast valid time. If you have +/-
30
>> minutes, you could try +/- 15 instead.
>>
>> Tara Jensen has some contacts at the NOAA group who generate this
>> data, and can reach out to them with any questions Jonathan has
after
>> examining the data more closely.
>>
>> Thanks,
>> John
>>
>> On Thu, Apr 22, 2021 at 9:15 AM robert.craig.2 at us.af.mil via RT <
>> met_help at ucar.edu> wrote:
>>
>> >
>> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>> >
>> > John, I look forward to your hear what you find out.
>> >
>> > Bob
>> >
>> > -----Original Message-----
>> > From: John Halley Gotway via RT <met_help at ucar.edu>
>> > Sent: Thursday, April 22, 2021 9:47 AM
>> > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
>> > <robert.craig.2 at us.af.mil>
>> > Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
>> > Duplicate Observations
>> >
>> > Bob,
>> >
>> > Ah, OK, I wasn’t sure on the source of these obs. We certainly
>> > wouldn’t expect users to change PrepBufr obs!
>> >
>> > Rather than tweaking the format of the data, we could instead
make
>> > the construction of that uniqueness key a configurable option.
>> >
>> > But I think I should refer this to the scientists in our group
for
>> > their input. I’ll let you know what I find out.
>> >
>> > John
>> >
>> > On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT <
>> > met_help at ucar.edu> wrote:
>> >
>> > >
>> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>> > >
>> > > John, these data come from PrepBufr observations.  Is there an
>> > > easy way to change the station elevation to NA in PB2NC or
Point
>> > > Stat or do I have to go into the NetCDF files and change the
>> > > elevation there before verification.  Maybe this would be a
useful PB2NC option.
>> > >
>> > > Thanks
>> > > Bob
>> > >
>> > > -----Original Message-----
>> > > From: John Halley Gotway via RT <met_help at ucar.edu>
>> > > Sent: Wednesday, April 21, 2021 9:40 PM
>> > > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
>> > > <robert.craig.2 at us.af.mil>
>> > > Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
Duplicate
>> > > Observations
>> > >
>> > > Bob,
>> > >
>> > > Thanks for sending the sample MPR data. I agree, there are no
>> > > *true
>> > > duplicates* in this output. And by "duplicate" I mean the same
>> > > lat, lon, level, elevation, timestamp, and so on. We'd found in
>> > > the past that the same observation can show up using both the
WMO
>> > > station id (with numbers) and non-WMO station id (with
letters).
>> > > But I don't see
>> > that occurring here.
>> > >
>> > > The total number of MPR lines is 564:
>> > >     > cat raob_15mar_00z.txt | wc -l
>> > >     > 564
>> > >
>> > > And when we look at the number of unique combinations of obs
>> > > valid time, latitude, longitude, level, and elevation, I also
get 564:
>> > >    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30,
>> > > $31}' | sort -u | wc -l
>> > >    > 564
>> > >
>> > > And setting the duplicate flag to unique should handle that
case
>> > > if any true duplicates were actually present:
>> > >    *duplicate_flag = UNIQUE;*
>> > >
>> > > The next thing to consider is the same station reporting
multiple
>> > > times within the time observation window. At first glance, I do
>> > > find the same station id showing up multiple times in the
output.
>> > > For example, station id "56571" shows up twice, once at
>> > > 20210314_233000 and a second time 30 minutes later at
20210315_000000.
>> > >
>> > > But careful inspection reveals that the lat/lon location for
this
>> > > station ID change!
>> > >    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' |
grep 56571
>> > >    > 56571 27.9 102.27
>> > >    > 56571 27.88 102.3
>> > >
>> > > I assume that 56671 is the id of a ballon that's moving in
time.
>> > > That would explain different lat/lon locations at different
times.
>> > >
>> > > But this doesn't explain it all. When I look at the unique
>> > > station names, latitude, and longitude, I find 538 unique
entries
>> > > in the 564 lines. So there really must be some stations
reporting multiple times.
>> > > The id 50527 is one such example. It appears in 2 MPR lines:
>> > >
>> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
>> > > 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500 ADPUPA
>> > > FULL BILIN
>> > > 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
>> > > 5120.60004
>> > > 5119
>> > > 2 5288 NA NA
>> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
>> > > 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500 ADPUPA
>> > > FULL BILIN
>> > > 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
>> > > 5120.60004
>> > > 5110
>> > > 2 5288 NA NA
>> > >
>> > > The observations timestamps change: 20210314_233000 vs
>> > > 20210315_000000 The station id, lat, lon, and level are the
same:
>> > > 50527 49.25 119.7
>> > > 500 But notice that the next column, for observation elevation
>> > > (OBS_ELV), does
>> > > differ: 5113.00537 vs 5104.01611
>> > >
>> > > All 4 of those values are included in the uniqueness key.
Here's
>> > > a code snippet from the file named pair_base.cc in MET:
>> > >
>> > >    //  build a uniqueness test key
>> > >    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
>> > >                        lat,         //  lat
>> > >                        lon,         //  lon
>> > >                        lvl,         //  level
>> > >                        elv).text(); //  elevation
>> > >
>> > > Because the elevation values differ, these 2 lines are not
>> > > considered to be the same station reporting multiple times in
the
>> > > time window. So that's the explanation for this behavior.
>> > >
>> > > So what should we do about it? Generally, the elevation is
meant
>> > > as the STATION ELEVATION. In this case it looks like it has the
>> > > current height of the ballon instead of the elevation. If you
>> > > were to modify the obs to just set the elevation value to NA,
>> > > then the results would be
>> > what you expect.
>> > >
>> > > Hope that helps clarify.
>> > >
>> > > John
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
>> > > <met_help at ucar.edu>
>> > > wrote:
>> > >
>> > > >
>> > > > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
>> > > > Transaction: Given to johnhg (John Halley Gotway) by mccabe
>> > > >        Queue: met_help
>> > > >      Subject: Duplicate Observations
>> > > >        Owner: johnhg
>> > > >   Requestors: robert.craig.2 at us.af.mil
>> > > >       Status: new
>> > > >  Ticket <URL:
>> > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
>> > > > >
>> > > >
>> > > >
>> > > > This transaction appears to have no content
>> > > >
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> >
>>
>>
>>
>>



------------------------------------------------
Subject: Duplicate Observations
From: John Halley Gotway
Time: Fri Apr 23 14:20:04 2021

Bob,

Perhaps, but when you look at the 18 TMP levels for 56571, they look
pretty
odd. There are 2 timestamps which correspond to 2 different lat/lon
locations. Hard to make complete sense of this.

John

grep " 56571 " 2021041800.txt | grep " TMP "
ADPUPA 56571 20210418_000000  27.88000  102.30000 1592 TMP 70
18664.728516
2    194.850000
ADPUPA 56571 20210418_000000  27.88000  102.30000 1592 TMP 50
20668.457031
2    205.450000
ADPUPA 56571 20210418_000000  27.88000  102.30000 1592 TMP 30
23884.441406
2    222.850000
ADPUPA 56571 20210418_000000  27.88000  102.30000 1592 TMP 20
26529.361328
2    221.650000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP
839.200012207031
 1602.098511 2    288.050000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 700
3114.022705
2    278.150000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 500
5802.221680
2    263.750000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 400
7490.486816
2    251.750000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 300
9556.482422
2    239.950000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 250
10817.922852
2    233.250000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 200
12324.836914
2    225.150000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 150
14166.398438
2    211.650000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 100
16596.097656
2    202.250000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 70
18673.115234
2    194.750000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 50
20672.982422
2    205.350000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 30
23895.214844
2    222.850000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 20
26539.328125
2    221.650000
ADPSFC 56571 20210418_000000  27.90000  102.27000 1599 TMP
839.799987792969
 1602.098511 9    288.950000

On Fri, Apr 23, 2021 at 2:14 PM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
> This could be caused by the new Radiosondes recording position at
every
> level instead of the older radiosondes coding the position of the
launch
> site.
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Friday, April 23, 2021 2:55 PM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
> Cc: jvigh at ucar.edu
> Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> Observations
>
> Bob,
>
> Thanks for sending the data files. I was able to pull them down to
our
> local project machine.
>
> Jonathan, the sample files can be found on kiowa
> in: /d1/projects/MET/MET_Help/craig_data_20210423.
>
> 2021041800z.pb is a prepbufr file containing all the obs and
2021041800.nc
> is the NetCDF output from the pb2nc tool.
>
> I did also run an Rscript to dump the NetCDF obs to ascii which make
them
> a bit easier to work with.
>    Rscript /usr/local/met-9.1/share/met/Rscripts/pntnc2ascii.R
> 2021041800.nc
> > 2021041800.txt
>
> You can see all the points for the sounding we were discussing by
grepping
> for it's ID. Looks like there's 38 levels for TMP there:
>    grep " 50527 " 2021041800.txt | grep " TMP " | wc -l
>   > 38
>
> Interestingly, the lat/lon remain constant for that one across all
38
> levels.
>
> For 56571, there are 18 levels and the lat/lon's change slightly:
>    grep " 56571 " 2021041800.txt | grep " TMP " | wc -l
>    > 18
>
> Thanks,
> John
>
> On Fri, Apr 23, 2021 at 1:40 PM John Halley Gotway <johnhg at ucar.edu>
> wrote:
>
> > Bob,
> >
> > My email address is "johnhg at ucar.edu" so I won't receive the one
sent
> to "
> > jhg at ucar.edu".
> >
> > Glad to hear that +/- 15 helped reduce the issue. That logic of
> > checking for duplicates by searching nearby lat/lon's could get
messy
> > and slow down the processing a lot. I'm thinking that we could
just
> > use the contents of the "OBS_SID" column to control this. From
this
> issue:
> > https://github.com/dtcenter/MET/issues/1762
> >
> > The default setting for the proposed new config file option would
be:
> >
> > obs_unique_key = [ "OBS_LAT", "OBS_LON", "OBS_LVL", "OBS_ELV" ];
> >
> > And I think the logic you want could be achieved by changing that
> > setting
> > to:
> >
> > obs_unique_key = [ "OBS_SID", "OBS_LVL" ];
> >
> > So any observations with the same station id and level would be
> > grouped together. And setting "obs_summary = NEAREST" would select
> > only the single observation from each sounding at the requested
level
> > whose timestamp is closest to the valid time of the forecast.
> >
> > Does that logic make sense?
> >
> > Thanks,
> > John
> >
> > On Fri, Apr 23, 2021 at 7:37 AM robert.craig.2 at us.af.mil via RT <
> > met_help at ucar.edu> wrote:
> >
> >>
> >> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> >>
> >>
> >> John, I did set the time to +- 15 minutes and that reduced the
> >> problem considerably.  I will send the files you requested by
> >> dod_safe.  I sent it to jhg at ncar.edu.  If that isn't right, let
me
> know.
> >>
> >> Bob
> >>
> >> -----Original Message-----
> >> From: John Halley Gotway via RT <met_help at ucar.edu>
> >> Sent: Thursday, April 22, 2021 12:15 PM
> >> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> >> <robert.craig.2 at us.af.mil>
> >> Cc: jvigh at ucar.edu
> >> Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
Duplicate
> >> Observations
> >>
> >> Bob,
> >>
> >> We discussed this situation at our weekly MET project meeting
today.
> >> As a result, I wrote up this feature request to add a
configuration
> >> option to better support this:
> >>    https://github.com/dtcenter/MET/issues/1762
> >>
> >> I also added Jonathan Vigh on this ticket, one of the scientists
in
> >> our group. He'd ideally like to take a look at the full PrepBufr
file
> >> to get a better understanding of what's going on. Perhaps the
balloon
> >> pops and records observations on the way down. Are you able to
share
> that with us?
> >> Or point us to it?
> >>
> >> For now, I do notice that the obs are 30 minutes apart in the MPR
data.
> >> The only easy option with the existing code would be using a
smaller
> >> obs_window setting around the forecast valid time. If you have
+/- 30
> >> minutes, you could try +/- 15 instead.
> >>
> >> Tara Jensen has some contacts at the NOAA group who generate this
> >> data, and can reach out to them with any questions Jonathan has
after
> >> examining the data more closely.
> >>
> >> Thanks,
> >> John
> >>
> >> On Thu, Apr 22, 2021 at 9:15 AM robert.craig.2 at us.af.mil via RT <
> >> met_help at ucar.edu> wrote:
> >>
> >> >
> >> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> >> >
> >> > John, I look forward to your hear what you find out.
> >> >
> >> > Bob
> >> >
> >> > -----Original Message-----
> >> > From: John Halley Gotway via RT <met_help at ucar.edu>
> >> > Sent: Thursday, April 22, 2021 9:47 AM
> >> > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> >> > <robert.craig.2 at us.af.mil>
> >> > Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
> >> > Duplicate Observations
> >> >
> >> > Bob,
> >> >
> >> > Ah, OK, I wasn’t sure on the source of these obs. We certainly
> >> > wouldn’t expect users to change PrepBufr obs!
> >> >
> >> > Rather than tweaking the format of the data, we could instead
make
> >> > the construction of that uniqueness key a configurable option.
> >> >
> >> > But I think I should refer this to the scientists in our group
for
> >> > their input. I’ll let you know what I find out.
> >> >
> >> > John
> >> >
> >> > On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT
<
> >> > met_help at ucar.edu> wrote:
> >> >
> >> > >
> >> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
>
> >> > >
> >> > > John, these data come from PrepBufr observations.  Is there
an
> >> > > easy way to change the station elevation to NA in PB2NC or
Point
> >> > > Stat or do I have to go into the NetCDF files and change the
> >> > > elevation there before verification.  Maybe this would be a
useful
> PB2NC option.
> >> > >
> >> > > Thanks
> >> > > Bob
> >> > >
> >> > > -----Original Message-----
> >> > > From: John Halley Gotway via RT <met_help at ucar.edu>
> >> > > Sent: Wednesday, April 21, 2021 9:40 PM
> >> > > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> >> > > <robert.craig.2 at us.af.mil>
> >> > > Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
Duplicate
> >> > > Observations
> >> > >
> >> > > Bob,
> >> > >
> >> > > Thanks for sending the sample MPR data. I agree, there are no
> >> > > *true
> >> > > duplicates* in this output. And by "duplicate" I mean the
same
> >> > > lat, lon, level, elevation, timestamp, and so on. We'd found
in
> >> > > the past that the same observation can show up using both the
WMO
> >> > > station id (with numbers) and non-WMO station id (with
letters).
> >> > > But I don't see
> >> > that occurring here.
> >> > >
> >> > > The total number of MPR lines is 564:
> >> > >     > cat raob_15mar_00z.txt | wc -l
> >> > >     > 564
> >> > >
> >> > > And when we look at the number of unique combinations of obs
> >> > > valid time, latitude, longitude, level, and elevation, I also
get
> 564:
> >> > >    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30,
> >> > > $31}' | sort -u | wc -l
> >> > >    > 564
> >> > >
> >> > > And setting the duplicate flag to unique should handle that
case
> >> > > if any true duplicates were actually present:
> >> > >    *duplicate_flag = UNIQUE;*
> >> > >
> >> > > The next thing to consider is the same station reporting
multiple
> >> > > times within the time observation window. At first glance, I
do
> >> > > find the same station id showing up multiple times in the
output.
> >> > > For example, station id "56571" shows up twice, once at
> >> > > 20210314_233000 and a second time 30 minutes later at
> 20210315_000000.
> >> > >
> >> > > But careful inspection reveals that the lat/lon location for
this
> >> > > station ID change!
> >> > >    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' |
grep
> 56571
> >> > >    > 56571 27.9 102.27
> >> > >    > 56571 27.88 102.3
> >> > >
> >> > > I assume that 56671 is the id of a ballon that's moving in
time.
> >> > > That would explain different lat/lon locations at different
times.
> >> > >
> >> > > But this doesn't explain it all. When I look at the unique
> >> > > station names, latitude, and longitude, I find 538 unique
entries
> >> > > in the 564 lines. So there really must be some stations
reporting
> multiple times.
> >> > > The id 50527 is one such example. It appears in 2 MPR lines:
> >> > >
> >> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> >> > > 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500
ADPUPA
> >> > > FULL BILIN
> >> > > 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
> >> > > 5120.60004
> >> > > 5119
> >> > > 2 5288 NA NA
> >> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> >> > > 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500
ADPUPA
> >> > > FULL BILIN
> >> > > 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
> >> > > 5120.60004
> >> > > 5110
> >> > > 2 5288 NA NA
> >> > >
> >> > > The observations timestamps change: 20210314_233000 vs
> >> > > 20210315_000000 The station id, lat, lon, and level are the
same:
> >> > > 50527 49.25 119.7
> >> > > 500 But notice that the next column, for observation
elevation
> >> > > (OBS_ELV), does
> >> > > differ: 5113.00537 vs 5104.01611
> >> > >
> >> > > All 4 of those values are included in the uniqueness key.
Here's
> >> > > a code snippet from the file named pair_base.cc in MET:
> >> > >
> >> > >    //  build a uniqueness test key
> >> > >    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
> >> > >                        lat,         //  lat
> >> > >                        lon,         //  lon
> >> > >                        lvl,         //  level
> >> > >                        elv).text(); //  elevation
> >> > >
> >> > > Because the elevation values differ, these 2 lines are not
> >> > > considered to be the same station reporting multiple times in
the
> >> > > time window. So that's the explanation for this behavior.
> >> > >
> >> > > So what should we do about it? Generally, the elevation is
meant
> >> > > as the STATION ELEVATION. In this case it looks like it has
the
> >> > > current height of the ballon instead of the elevation. If you
> >> > > were to modify the obs to just set the elevation value to NA,
> >> > > then the results would be
> >> > what you expect.
> >> > >
> >> > > Hope that helps clarify.
> >> > >
> >> > > John
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
> >> > > <met_help at ucar.edu>
> >> > > wrote:
> >> > >
> >> > > >
> >> > > > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> >> > > > Transaction: Given to johnhg (John Halley Gotway) by mccabe
> >> > > >        Queue: met_help
> >> > > >      Subject: Duplicate Observations
> >> > > >        Owner: johnhg
> >> > > >   Requestors: robert.craig.2 at us.af.mil
> >> > > >       Status: new
> >> > > >  Ticket <URL:
> >> > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> >> > > > >
> >> > > >
> >> > > >
> >> > > > This transaction appears to have no content
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >>
>
>
>
>

------------------------------------------------
Subject: RE: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate Observations
From: robert.craig.2 at us.af.mil
Time: Fri Apr 23 14:23:08 2021


I will ask around here to see if there is something wrong with the
RAOB.

-----Original Message-----
From: John Halley Gotway via RT <met_help at ucar.edu>
Sent: Friday, April 23, 2021 3:20 PM
To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
<robert.craig.2 at us.af.mil>
Cc: jvigh at ucar.edu
Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
Observations

Bob,

Perhaps, but when you look at the 18 TMP levels for 56571, they look
pretty odd. There are 2 timestamps which correspond to 2 different
lat/lon locations. Hard to make complete sense of this.

John

grep " 56571 " 2021041800.txt | grep " TMP "
ADPUPA 56571 20210418_000000  27.88000  102.30000 1592 TMP 70
18664.728516
2    194.850000
ADPUPA 56571 20210418_000000  27.88000  102.30000 1592 TMP 50
20668.457031
2    205.450000
ADPUPA 56571 20210418_000000  27.88000  102.30000 1592 TMP 30
23884.441406
2    222.850000
ADPUPA 56571 20210418_000000  27.88000  102.30000 1592 TMP 20
26529.361328
2    221.650000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP
839.200012207031
 1602.098511 2    288.050000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 700
3114.022705
2    278.150000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 500
5802.221680
2    263.750000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 400
7490.486816
2    251.750000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 300
9556.482422
2    239.950000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 250
10817.922852
2    233.250000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 200
12324.836914
2    225.150000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 150
14166.398438
2    211.650000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 100
16596.097656
2    202.250000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 70
18673.115234
2    194.750000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 50
20672.982422
2    205.350000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 30
23895.214844
2    222.850000
ADPUPA 56571 20210417_233000  27.90000  102.27000 1599 TMP 20
26539.328125
2    221.650000
ADPSFC 56571 20210418_000000  27.90000  102.27000 1599 TMP
839.799987792969
 1602.098511 9    288.950000

On Fri, Apr 23, 2021 at 2:14 PM robert.craig.2 at us.af.mil via RT <
met_help at ucar.edu> wrote:

>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
>
> This could be caused by the new Radiosondes recording position at
> every level instead of the older radiosondes coding the position of
> the launch site.
>
> -----Original Message-----
> From: John Halley Gotway via RT <met_help at ucar.edu>
> Sent: Friday, April 23, 2021 2:55 PM
> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> <robert.craig.2 at us.af.mil>
> Cc: jvigh at ucar.edu
> Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675] Duplicate
> Observations
>
> Bob,
>
> Thanks for sending the data files. I was able to pull them down to
our
> local project machine.
>
> Jonathan, the sample files can be found on kiowa
> in: /d1/projects/MET/MET_Help/craig_data_20210423.
>
> 2021041800z.pb is a prepbufr file containing all the obs and
> 2021041800.nc is the NetCDF output from the pb2nc tool.
>
> I did also run an Rscript to dump the NetCDF obs to ascii which make
> them a bit easier to work with.
>    Rscript /usr/local/met-9.1/share/met/Rscripts/pntnc2ascii.R
> 2021041800.nc
> > 2021041800.txt
>
> You can see all the points for the sounding we were discussing by
> grepping for it's ID. Looks like there's 38 levels for TMP there:
>    grep " 50527 " 2021041800.txt | grep " TMP " | wc -l
>   > 38
>
> Interestingly, the lat/lon remain constant for that one across all
38
> levels.
>
> For 56571, there are 18 levels and the lat/lon's change slightly:
>    grep " 56571 " 2021041800.txt | grep " TMP " | wc -l
>    > 18
>
> Thanks,
> John
>
> On Fri, Apr 23, 2021 at 1:40 PM John Halley Gotway <johnhg at ucar.edu>
> wrote:
>
> > Bob,
> >
> > My email address is "johnhg at ucar.edu" so I won't receive the one
> > sent
> to "
> > jhg at ucar.edu".
> >
> > Glad to hear that +/- 15 helped reduce the issue. That logic of
> > checking for duplicates by searching nearby lat/lon's could get
> > messy and slow down the processing a lot. I'm thinking that we
could
> > just use the contents of the "OBS_SID" column to control this.
From
> > this
> issue:
> > https://github.com/dtcenter/MET/issues/1762
> >
> > The default setting for the proposed new config file option would
be:
> >
> > obs_unique_key = [ "OBS_LAT", "OBS_LON", "OBS_LVL", "OBS_ELV" ];
> >
> > And I think the logic you want could be achieved by changing that
> > setting
> > to:
> >
> > obs_unique_key = [ "OBS_SID", "OBS_LVL" ];
> >
> > So any observations with the same station id and level would be
> > grouped together. And setting "obs_summary = NEAREST" would select
> > only the single observation from each sounding at the requested
> > level whose timestamp is closest to the valid time of the
forecast.
> >
> > Does that logic make sense?
> >
> > Thanks,
> > John
> >
> > On Fri, Apr 23, 2021 at 7:37 AM robert.craig.2 at us.af.mil via RT <
> > met_help at ucar.edu> wrote:
> >
> >>
> >> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> >>
> >>
> >> John, I did set the time to +- 15 minutes and that reduced the
> >> problem considerably.  I will send the files you requested by
> >> dod_safe.  I sent it to jhg at ncar.edu.  If that isn't right, let
me
> know.
> >>
> >> Bob
> >>
> >> -----Original Message-----
> >> From: John Halley Gotway via RT <met_help at ucar.edu>
> >> Sent: Thursday, April 22, 2021 12:15 PM
> >> To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> >> <robert.craig.2 at us.af.mil>
> >> Cc: jvigh at ucar.edu
> >> Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
> >> Duplicate Observations
> >>
> >> Bob,
> >>
> >> We discussed this situation at our weekly MET project meeting
today.
> >> As a result, I wrote up this feature request to add a
configuration
> >> option to better support this:
> >>    https://github.com/dtcenter/MET/issues/1762
> >>
> >> I also added Jonathan Vigh on this ticket, one of the scientists
in
> >> our group. He'd ideally like to take a look at the full PrepBufr
> >> file to get a better understanding of what's going on. Perhaps
the
> >> balloon pops and records observations on the way down. Are you
able
> >> to share
> that with us?
> >> Or point us to it?
> >>
> >> For now, I do notice that the obs are 30 minutes apart in the MPR
data.
> >> The only easy option with the existing code would be using a
> >> smaller obs_window setting around the forecast valid time. If you
> >> have +/- 30 minutes, you could try +/- 15 instead.
> >>
> >> Tara Jensen has some contacts at the NOAA group who generate this
> >> data, and can reach out to them with any questions Jonathan has
> >> after examining the data more closely.
> >>
> >> Thanks,
> >> John
> >>
> >> On Thu, Apr 22, 2021 at 9:15 AM robert.craig.2 at us.af.mil via RT <
> >> met_help at ucar.edu> wrote:
> >>
> >> >
> >> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675 >
> >> >
> >> > John, I look forward to your hear what you find out.
> >> >
> >> > Bob
> >> >
> >> > -----Original Message-----
> >> > From: John Halley Gotway via RT <met_help at ucar.edu>
> >> > Sent: Thursday, April 22, 2021 9:47 AM
> >> > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> >> > <robert.craig.2 at us.af.mil>
> >> > Subject: Re: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
> >> > Duplicate Observations
> >> >
> >> > Bob,
> >> >
> >> > Ah, OK, I wasn’t sure on the source of these obs. We certainly
> >> > wouldn’t expect users to change PrepBufr obs!
> >> >
> >> > Rather than tweaking the format of the data, we could instead
> >> > make the construction of that uniqueness key a configurable
option.
> >> >
> >> > But I think I should refer this to the scientists in our group
> >> > for their input. I’ll let you know what I find out.
> >> >
> >> > John
> >> >
> >> > On Thu, Apr 22, 2021 at 8:01 AM robert.craig.2 at us.af.mil via RT
<
> >> > met_help at ucar.edu> wrote:
> >> >
> >> > >
> >> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
>
> >> > >
> >> > > John, these data come from PrepBufr observations.  Is there
an
> >> > > easy way to change the station elevation to NA in PB2NC or
> >> > > Point Stat or do I have to go into the NetCDF files and
change
> >> > > the elevation there before verification.  Maybe this would be
a
> >> > > useful
> PB2NC option.
> >> > >
> >> > > Thanks
> >> > > Bob
> >> > >
> >> > > -----Original Message-----
> >> > > From: John Halley Gotway via RT <met_help at ucar.edu>
> >> > > Sent: Wednesday, April 21, 2021 9:40 PM
> >> > > To: CRAIG, ROBERT J GS-12 USAF ACC 16 WS/WXD
> >> > > <robert.craig.2 at us.af.mil>
> >> > > Subject: [Non-DoD Source] Re: [rt.rap.ucar.edu #99675]
> >> > > Duplicate Observations
> >> > >
> >> > > Bob,
> >> > >
> >> > > Thanks for sending the sample MPR data. I agree, there are no
> >> > > *true
> >> > > duplicates* in this output. And by "duplicate" I mean the
same
> >> > > lat, lon, level, elevation, timestamp, and so on. We'd found
in
> >> > > the past that the same observation can show up using both the
> >> > > WMO station id (with numbers) and non-WMO station id (with
letters).
> >> > > But I don't see
> >> > that occurring here.
> >> > >
> >> > > The total number of MPR lines is 564:
> >> > >     > cat raob_15mar_00z.txt | wc -l
> >> > >     > 564
> >> > >
> >> > > And when we look at the number of unique combinations of obs
> >> > > valid time, latitude, longitude, level, and elevation, I also
> >> > > get
> 564:
> >> > >    > cat raob_15mar_00z.txt | awk '{print $8, $28, $29, $30,
> >> > > $31}' | sort -u | wc -l
> >> > >    > 564
> >> > >
> >> > > And setting the duplicate flag to unique should handle that
> >> > > case if any true duplicates were actually present:
> >> > >    *duplicate_flag = UNIQUE;*
> >> > >
> >> > > The next thing to consider is the same station reporting
> >> > > multiple times within the time observation window. At first
> >> > > glance, I do find the same station id showing up multiple
times in the output.
> >> > > For example, station id "56571" shows up twice, once at
> >> > > 20210314_233000 and a second time 30 minutes later at
> 20210315_000000.
> >> > >
> >> > > But careful inspection reveals that the lat/lon location for
> >> > > this station ID change!
> >> > >    > cat raob_15mar_00z.txt | awk '{print $27, $28, $29}' |
> >> > > grep
> 56571
> >> > >    > 56571 27.9 102.27
> >> > >    > 56571 27.88 102.3
> >> > >
> >> > > I assume that 56671 is the id of a ballon that's moving in
time.
> >> > > That would explain different lat/lon locations at different
times.
> >> > >
> >> > > But this doesn't explain it all. When I look at the unique
> >> > > station names, latitude, and longitude, I find 538 unique
> >> > > entries in the 564 lines. So there really must be some
stations
> >> > > reporting
> multiple times.
> >> > > The id 50527 is one such example. It appears in 2 MPR lines:
> >> > >
> >> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> >> > > 20210314_233000 20210314_233000 HGT gpm P500 HGT NA P500
ADPUPA
> >> > > FULL BILIN
> >> > > 4 NA NA NA NA MPR 561 137 50527 49.25 119.7 500 5113.00537
> >> > > 5120.60004
> >> > > 5119
> >> > > 2 5288 NA NA
> >> > > V8.1 galwem NA 000000 20210315_000000 20210315_000000 000000
> >> > > 20210315_000000 20210315_000000 HGT gpm P500 HGT NA P500
ADPUPA
> >> > > FULL BILIN
> >> > > 4 NA NA NA NA MPR 561 328 50527 49.25 119.7 500 5104.01611
> >> > > 5120.60004
> >> > > 5110
> >> > > 2 5288 NA NA
> >> > >
> >> > > The observations timestamps change: 20210314_233000 vs
> >> > > 20210315_000000 The station id, lat, lon, and level are the
same:
> >> > > 50527 49.25 119.7
> >> > > 500 But notice that the next column, for observation
elevation
> >> > > (OBS_ELV), does
> >> > > differ: 5113.00537 vs 5104.01611
> >> > >
> >> > > All 4 of those values are included in the uniqueness key.
> >> > > Here's a code snippet from the file named pair_base.cc in
MET:
> >> > >
> >> > >    //  build a uniqueness test key
> >> > >    string obs_key = str_format("%.3f:%.3f:%.2f:%.2f",
> >> > >                        lat,         //  lat
> >> > >                        lon,         //  lon
> >> > >                        lvl,         //  level
> >> > >                        elv).text(); //  elevation
> >> > >
> >> > > Because the elevation values differ, these 2 lines are not
> >> > > considered to be the same station reporting multiple times in
> >> > > the time window. So that's the explanation for this behavior.
> >> > >
> >> > > So what should we do about it? Generally, the elevation is
> >> > > meant as the STATION ELEVATION. In this case it looks like it
> >> > > has the current height of the ballon instead of the
elevation.
> >> > > If you were to modify the obs to just set the elevation value
> >> > > to NA, then the results would be
> >> > what you expect.
> >> > >
> >> > > Hope that helps clarify.
> >> > >
> >> > > John
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Apr 21, 2021 at 3:04 PM George McCabe via RT
> >> > > <met_help at ucar.edu>
> >> > > wrote:
> >> > >
> >> > > >
> >> > > > Wed Apr 21 15:03:36 2021: Request 99675 was acted upon.
> >> > > > Transaction: Given to johnhg (John Halley Gotway) by mccabe
> >> > > >        Queue: met_help
> >> > > >      Subject: Duplicate Observations
> >> > > >        Owner: johnhg
> >> > > >   Requestors: robert.craig.2 at us.af.mil
> >> > > >       Status: new
> >> > > >  Ticket <URL:
> >> > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=99675
> >> > > > >
> >> > > >
> >> > > >
> >> > > > This transaction appears to have no content
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >>
>
>
>
>



------------------------------------------------


More information about the Met_help mailing list