[Go-essp-tech] Extending the DRS syntax to observations

Thu Feb 10 08:28:24 MST 2011

Hi all,
in preparation for our upcoming conference call, I compiled a list of questions that could guide our discussion. This is just a possible way to tackle the problem, if anybody thinks we should follow a different course we can certainly do that.

thanks, Luca

AGENDA

KEY QUESTIONS

1) What are the fields that unequivocally identify an observational dataset, for the purpose of comparing to CMIP5 models ? Are the fields the same for different types of observations ?

- Fields that have the same values as for models: variable, sampling frequency, realm, time period, version

- Fields that have model correspondent, but not necessarily the same values: activity, product

- Fields that are specific to observations: instrument, mission ?, agency ?, resolution ? level and/or processing algorithms, others ?

2) Should observations be organized according to the same directory structure as for models ? If not, should the hierarchy be the same for all observations, or be different for different kind of observations ?

3) What is the convention naming observational files ? (informed from the convention for the directory hierarchy)

4) Should CMOR be mandated for processing observations for CMIP5, or should we rely on CF and CMOR checkers ?

- If CMOR is mandated, how much work (if any) is involved, and where should the funding come from ?

5) Should the Controlled Vocabulary for observations be encoded in one CMOR table, or more than one ?

6) How can the Controlled Vocabulary be developed as a community ?

OTHER DETAILS

Should data be organized on disk as from the CMOR output, or as the DRS specification ?

Which global attributes to be included in netCDF files ?

Which characters can be included in fields that are not controlled vocabularies, for example the <processing level and product version> ?

How to encode ascending versus descending satellite measurements ?

Should the names in the controlled vocabulary be case sensitive ?

On Feb 9, 2011, at 10:07 PM, Karl Taylor wrote:

Hi all,

I'm at a meeting on the east coast and will be tied up from 11-12 this morning.  Here are some brief comments concerning the draft proposal that I hope will be of some use in my absence:

1.  The directory structure and filenames (and the underlying DRS) are all meant to make it easier for users to navigate to the data they want and to unambiguously identify the data, so that is much more important than making it look like CMIP5.  My sense is that folks looking for observational data will want to be able to easily see (through the DRS categories)
a) the variable
b) the sampling frequency
c) perhaps the "realm"
d) the time-period over which the variable was measured

Users will want to be able to distinguish among the various observational products available for his (her) purposes.  So,
a) something about how the measurement was made and processed  (maybe instrument is sufficient; perhaps institute, mission, agency are not all needed)
b) version of observational product

2.  Note that the default directory structure generated by CMOR2 differs from the ESGF directory structure, as described in sections 3.1 and 3.3 of

http://cmip-pcmdi.llnl.gov/cmip5/output_req.html?submenuheader=2#req_format

In that document "It is recommended that ESGF data nodes should layout datasets on disk mapping DRS components to directories as:

<activity>/<product>/<institute>/<model>/<experiment>/<frequency>/<modeling realm>/<MIP table>/<ensemble member>/<version number>/<variable name>/ <CMOR filename>.nc

Example:

/CMIP5/output1/UKMO/HadCM3/decadal1990/day/atmos/day/r3i2p1/v20100105/tas/ tas_day_HADCM3_ decadal1990_r3i2p1_199001-199012.nc

The observations don't need to follow this template (and probably shouldn't), but the current observations draft document incorrectly describes the CMIP5 structure.

3.  I would recommend that observational products be written using CMOR2.  I do not think it is a good use of resources to generalize and "harden" the CMOR checker to enforce anything.  It wasn't meant for this purpose and this would be a big job.

4.  I would advise that all variables that appear in a single CMOR table at least
a) share the same sampling frequency
b) share the same realm (although you might want to include 2 closely-related realms)

4.  Recall Charles Doutriaux's note that we do not yet have program support for some of what will be needed.

5.  also a reminder:  observations should not be under the "CMIP5" activity.  I can ask the WGCM if something like "obs4CMIP5" would be o.k. (I rather like this.)

In preparing the above comments, I've mostly thought about gridded global datasets.

Best regards,
Karl

On 2/9/11 9:25 AM, Cinquini, Luca (3880) wrote:

Hi all,
        Dean Williams has kindly made the following number available for tomorrow's conference call:

 (925) 424-8105
 access code 305757#

The call is scheduled for 8am PST / 9am MST / 10am CST / 11am EST / 16pm GMT / 17pm France/Germany. We will discuss the adoption of community-wide metadata conventions for observational datasets that are going to be made part of the CMIP5 archive.

Thanks in advance to all for participating,

Luca

On Feb 9, 2011, at 8:35 AM, Christensen, Sigurd W. wrote:

Luca,
  Several of us think that a call Thursday would be good, but we probably won't finish then.

  We think that not only a different CMOR table, but also a different directory structure/filename structure may be appropriate for three or more of the categories mentioned in the link:

-Decide on whether to have one single CMOR table for observations (currently "obsSites"), or more than one depending on types of observational data:
  *remote sensed (grids and swaths)
  *in-situ stations (time series and profiles)
  *trajectory-based observations
  *in-situ gridded products

  The discussion thus far emphasizes fields and order for naming conventions for satellite-based data.  Perhaps those can be finalized Thursday.  But point-oriented surface and/or profile time-series data (such as ARM, AmeriFlux, etc.), and trajectory-based observations, will likely need more consideration.  Karl, on January 31, indicated that variable name, modeling realm, and frequency should be carried to the DRS (Data Reference System), but the rest could in essence be tailored to the needs of observational data.

  Thanks,
  Giri and Sig

-----Original Message-----
From: Cinquini, Luca (3880) [mailto:Luca.Cinquini at jpl.nasa.gov]
Sent: Tuesday, February 08, 2011 15:58
To: Lynnes, Christopher S. (GSFC-6102)
Cc: Huffman, George J. (GSFC-613.1)[SCIENCE SYSTEMS APPLICATIONS]; Karl Taylor; Steve Hankin; Bryan Lawrence; go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>; Sébastien Denvil; climate-obs; McCoy, Renata
Subject: Re: [Go-essp-tech] Extending the DRS syntax to observations

Hi all,
        I would like to propose to have a conference call to discuss and hopefully resolve any remaining issues concerning metadata conventions for CMIP5 observations. Would anybody object if we had this call in only two days, next Thursday February 10, at 8am PST/11am EST - which I think is is 4pm in the UK and 5pm in France and Germany ? If this is too soon, we could postpone till next week.

As a remainder, this is the URL of the current proposal:

http://oodt.jpl.nasa.gov/wiki/display/CLIMATE/Data+and+Metadata+Requirements+for+CMIP5+Observational+Datasets

which at the very beginning contains a summary of the issues still open. Please reply if you can't make the meeting and you really would like to attend, or if you think there are other issues to discuss.

Best regards,
thanks, Luca

P.S.: if the conference is a go, we'll setup a phone line....

On Feb 2, 2011, at 3:17 PM, Lynnes, Christopher S. (GSFC-6102) wrote:

On Feb 2, 2011, at 5:08 PM, Cinquini, Luca (3880) wrote:

Hi Chris and George,
        thanks for your input... I guess the question is wether you would be opposed to re-arranging the fields according to an order that is commonly agreed upon (and that possibly resembles the DRS structure for models), provided that all the relevant information is included ?

Since my philosophy is to tailor for the expected user community, I defer to you and your colleagues regarding the order, since you know the community.  My main interest is just ensuring the inclusion of the relevant information.

I think at this point we might be able to make faster progress by organizing a conference call to discuss these issues...

thanks, Luca

On Feb 2, 2011, at 2:42 PM, Lynnes, Christopher S. (GSFC-6102) wrote:

On Feb 2, 2011, at 4:26 PM, George J. Huffman wrote:

There are other variables that could go in the last position since the
original datasets contain multiple variables as "fields".  I should say
that the Goddard DISC puts Level before Instrument, and you might want
to consider why they did that.  [This is mostly an issue if you're
trying to build a syntax that is generally useful, not just focused on
gridded data.]

We (at Goddard DISC) put Level before Instrument because we anticipate that the user community for Level 3 gridded data is somewhat distinct than for Level 2 or Level 1 swath data, which require considerably more sophisticated and customized tools to work with than Level 3.  I don't know if that is as relevant in the CMIP5 context as in our more generalized search interface (as George implies.)
--
Dr. Christopher Lynnes     NASA/GSFC, Code 610.2    phone: 301-614-5185

--
Dr. Christopher Lynnes     NASA/GSFC, Code 610.2    phone: 301-614-5185

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110210/0574f6ce/attachment-0001.html