[Go-essp-tech] Extending the DRS syntax to observations

Wed Feb 2 14:26:04 MST 2011

Hi all - I'm one of the observational data guys, specifically computing 
gridded multi-satellite precipitation estimates.  I've been lurking on 
this list and trying to understand the context before I post, but it 
might be helpful for me to say what my implicit data hierarchy looks 
like for precipitation data in general (with reference to the discussion 
so far):

Institution > Project > Instrument > Level > Algorithm > Variable

So, the "final" fine-scale precipitation from the TRMM Multi-satellite 
Precipitation Analysis (TMPA) would be

NASA > TRMM > Multi-satellite > 3 > TMPA_version_6 > [3-hourly 0.25-deg 
precipitation estimate in mm/hr]

There are other variables that could go in the last position since the 
original datasets contain multiple variables as "fields".  I should say 
that the Goddard DISC puts Level before Instrument, and you might want 
to consider why they did that.  [This is mostly an issue if you're 
trying to build a syntax that is generally useful, not just focused on 
gridded data.]  Also, there are multi-instrument algorithms on the same 
satellite (or nearby satellites in the future).  So, you'd need to 
decide whether you want a separate "Multi-instrument" entry for 
Instrument, or whether "Multi-instrument" includes both multiple 
instruments and satellites, with the details resolved at the algorithm 
level.  One argument for the first is that at least part of the time 
multiple instruments on the same satellite are used to create a 
multi-instrument Level 2 product which is then gridded to Level 3.

It is usually important (in the case of precipitation) to denote whether 
the data field is on the original grid or not.  I'm not sure how this 
fits the AR5 needs, but a general syntax needs to account for the 
possibility that the archive might have more than one representation of 
the same data.  A similar issue arises with different versions of the 
algorithm.  I tend to think of it as part of the Algorithm (TMPA V.6), 
but it could be a separate layer in the classification hierarchy.

I hope this helps -
George

On 1/31/11 4:51 PM, Karl Taylor wrote:
> Hi all,
>
> I've been reading all the emails on this topic, but I haven't had a
> moment to respond.    I am not at all sure it is necessary for the DRS
> for observations to correspond with that for models;  afterall we're
> trying to describe stuff with different "attributes".  I think that ESG
> should be able to make it easy for users to find observations that are
> useful for CMIP5 without having to fit the observations into exactly the
> same structure.  So my general advice is to devise something that makes
> sense (and will not be too limiting in the future), rather than
> attempting to perfectly parallel CMIP5 DRS.
>
> I've also been in contact with the CMIP modeling panel, and the opinions
> received so far is to avoid direct mention of CMIP5 anywhere in the
> observation metadata.  CMIP does not want to appear to endorse any
> particular data.   On the other hand, the panel has expressed strong
> support for putting observations in a form that makes it easy to compare
> the models against.  They are very enthusiastic about the efforts of
> this (and any other similar) groups.    We plan to include on the CMIP5
> website a list of observational products that have been prepared using
> CMOR (or with software that produces a similar result), and indicate
> that this makes it easier to use these products.  At the same time we
> will have to make it clear that CMIP is not necessarily endorsing these
> products as superior to alternative products (which might be more
> difficult to use but could conceivably be based on more reliable
> observations).
>
> As for users searching for data that can be compared to CMIP5, I think
> we'll be o.k. as long as the following are included in the DRS (and are
> consistent with the vocabulary used to describe the models):
>
> variable name
> modeling realm
> frequency
>
> Other search categories shouldn't have to be consistent with CMIP5 and
> could be omitted if inappropriate.  Similarly others could be added that
> are not used for models, and others could be modified somewhat, if
> necessary.
>
> Hope this makes designing a rational DRS for observational products easier.
>
> Best regards,
> Karl
>
> On 1/31/11 1:25 PM, Lynnes, Christopher S. (GSFC-6102) wrote:
>> On Jan 31, 2011, at 1:20 PM, Steve Hankin wrote:
>>
>>> Hi Luca,
>>>
>>> Just below is a version of your email of this morning which has minor editorial changes (in red) to reflect a broader outlook on the term "observations", and an initial proposal for gridded in situ observations.    I think if we adopt language like this we will be glad of it in the long term:
>>>      - Steve
>>> Open Questions
>>>
>>> 	• Agree on basic DRS-like structure for observations and meaning of each field
>>> 		• Use activity="cmip5" or activity="cmip54obs", "obs4models" ?
>>> 		• DRS hierarchy for models (partial): "institute">  "model">  "experiment">  "ensemble"
>>> 		• Current proposed hierarchy for gridded remote sensed observations (partial): "institute" (="agency")>  "mission">  "instrument">  ("processing level" + "other specifiers")
>>> 		• Alternative proposal for gridded remote sensed observations (partial): "institute" (="agency")>  "instrument">  "processing level">  ?
>> Gridded remote sensed observations are by definition Level 3 (in NASA terminology at least), so processing level is mostly redundant in this context.
>>
>> Processing level is also not especially analogous to "ensemble", perhaps making it a bit non-intuitive. (But does that matter to this user community?)
>>
>> So, would it be completely out of place to put an identifier for the processing algorithm and version in at this point?  For instance, there are some variables (e.g. Ozone) that are computed from the same instrument but using two different algorithms. And of course, the processing code version (or data collection version) can also have multiple instances for the same variable.
>>
>>> 		• Need a proposal for gridded in situ observations, say:  "institute" (="agency")>  "program">  "resolution">                  "variant" ?
>>> 			• e.g. NOAA/NCAR / ICOADS / 1degree / equatorial /
>>> 		• Should we even attempt to describe in situ (point and trajectory) observations through DRS?  (can we succeed at this?)
>>> 	• Decide on whether to have one single CMOR table for observations (currently "obsSites"), or more than one depending on types of observational data:
>>> 		• remote sensed (grids and swaths)
>>> 		• in-situ stations (time series and profiles)
>>> 		• trajectory-based observations
>>> 		• in-situ gridded products
>>> 	• Decide on whether to encode global attributes for data source in netcdf files ("source", "source_datastream", "source_url", "source_reference")
>>> Action Items
>>>
>>> 	• Populate CV for observations (decide on upper/lower case)
>>> 	• Produce some reference datasets
>>> 	• Develop snippet of "esg.ini" (ESG publisher) configuration for processing observations
>>>
>>>
>>> On 1/31/2011 6:58 AM, Cinquini, Luca (3880) wrote:
>>>> Hi all,
>>>>   thanks to everybody for the lively discussion... I just wanted to summarize what I think is the status so far - I posted this also on the wiki. Please keep the discussion going...
>>>> thanks, Luca
>>>>
>>>> Open Questions
>>>>
>>>> 	• Agree on basic DRS-like structure for observations and meaning of each field
>>>> 		• Use activity="cmip5" or activity="cmip54obs", "obs4models" ?
>>>> 		• DRS hierarchy for models (partial): "institute">  "model">  "experiment">  "ensemble"
>>>> 		• Current proposed hierarchy for observations (partial): "institute" (="agency")>  "mission">  "instrument">  ("processing level" + "other specifiers")
>>>> 		• Alternative proposal for observations (partial): "institute" (="agency")>  "instrument">  "processing level">  ?
>>>> 	• Decide on wether to have one single CMOR table for observations (currently "obsSites"), or more than one depending on types of observational data:
>>>> 		• remote sensed (grids and swaths)
>>>> 		• in-situ stations (time series and profiles)
>>>> 		• trajectory-based observations
>>>> 		• in-situ gridded products
>>>> 	• Decide on wether to encode global attributes for data source in netcdf files ("source", "source_datastream", "source_url", "source_reference")
>>>> Action Items
>>>>
>>>> 	• Populate CV for observations (decide on upper/lower case)
>>>> 	• Produce some reference datasets
>>>> 	• Develop snippet of "esg.ini" (ESG publisher) configuration for processing observations
>>>>
>>>>
>>>> On Jan 31, 2011, at 2:42 AM, Bryan Lawrence wrote:
>>>>
>>>>> Hi Folks
>>>>>
>>>>> Sorry I've come late to this discussion.
>>>>>
>>>>> I've got just two simple points to make (for now).
>>>>>
>>>>> Firstly, I think this activity (organising observational data to support
>>>>> CMIP5) is not the same as cmip5 itself.  Sooner or later things break in
>>>>> the DRS heirarchy doing this, and so I think it would be helpful to all
>>>>> consumers if this was dealt with at the top level of the DRS.
>>>>> Additionally, remember obs are timeless, models are not. This data will
>>>>> be useful beyond cmip5 (e.g. cordex). So I recommend *not*  shoehorning
>>>>> all the obs data under cmip5 in the DRS.
>>>>>
>>>>> The DRS allows you to define new activiites, and I'd do so, something
>>>>> like obs4models or (if you must) cmip5obs ....
>>>>>
>>>>> Otherwise I'm fine with the approach.
>>>>>
>>>>> Secondly, wrt to Steve's list below: there is of course swath data as
>>>>> well ... but otherwise I rather agree that 3 and 1 can be handled the
>>>>> same.
>>>>>
>>>>> Cheers
>>>>> Bryan
>>>>>
>>>>>
>> --
>> Dr. Christopher Lynnes     NASA/GSFC, Code 610.2    phone: 301-614-5185
>>
>>
>>

-- 
George J. Huffman, Ph.D.  (Voice)  +1 301-614-6308
Sci. Sys. & Appl., Inc.   (FAX)    +1 301-614-5492
NASA/GSFC Code 613.1      (Email)  george.j.huffman at nasa.gov
Greenbelt, MD 20771 USA   (Office) Bld. 33 Room C417