[Go-essp-tech] comparison of GDS2.0 with climate modellers format CMIP5

Fri Mar 18 17:25:57 MDT 2011

Hi Bryan,

I am definitely on board with your approach!  And we definitely all want to ensure GHRSST L3 products (or some subset of them) are available to CMIP5.  A couple more comments below...

On Mar 18, 2011, at 3:31 PM, Bryan Lawrence wrote:

> Hi Folks
> 
> I've seen some of your correspondence on the above subject.
> 
> Suffice to I think it'd be helpful if this discussion was  conducted on a 
> slightly wider stage. To that end, I've copied in the go- essp-tech 
> list, where you'll get the folk who have devised the CMIP5 
> data standards - and data distiribution system. 
> 
> There are some general points that might help in the discussoin to 
> follow:
> 
> CMOR is absolutely a tool devised for climate model data output, and 
> some of the things you find strange (demanding double precisoin etc) are 
> absolutely necessary in that context ...
> 
> The decision to use NetCDF3 rather than NetCDF4 was taken some time ago 
> after much discussion and heartache (and we had just about settled on 
> NetCDF4 before we did an "about turn"). In practice, many of the reasons 
> we used are probably not now relevant, but we are where we are ... wrt 
> the CMIP5 model data! (Which is to say I think there might be room to do 
> things differently with the EO data.)
> (Incidentally however, the 2 GB limit is helpful in chunking data over 
> low bandwidth links ... and we need to deal with between file aggregation 
> for many other reasons, so it's not a big deal if we break things up).
> 

It is not a huge deal to stick with netCDF-3, especially given some of the other choices you've made like limiting to single variable files.  We've worked extensively with netCDF-3, but at US NODC have been focussed a lot lately on netCDF-4 and the performance aspects are especially useful for large, multi-variable files.  Did I read somewhere that CMIP5 prefers monthly, one-degree resolution EO data?  I do see that your directory and file name structures handle other frequencies so maybe I am wrong about that monthly, one-degree part.

> Which brings me to this:  The CMIP5 community is working to 
> accommodating  EO data, but indeed there are signifcant diffences, and 
> many of those are  obviously evident in the relative importance folks 
> apply to the various  CF headers. My personal opinion is that the way 
> forward is to itemise  the specific difference, and then have a disucssion 
> as to why one might do  things in a specific way.
> 

That is exactly the approach I'd like to take too!

> For example, if you want level 3 EO data to be easily useful by the CMIP 
> community (and I believe you do), then I suggest you conform as closely 
> as you can to the CMIP5 paradigm.  Remember that the CMIP5 protocol is 
> already the joint agreement of hundreds of climate modellers ...
> 

Yes, we understand that concept in GHRSST.  The GHRSST Data Specification v2.0 (GDS2) is the result of a multi-year effort of lots (maybe hundreds, or at least one hundred) SST data providers and users.  It's been officially published and while it does have a routine update cycle, it can't be changed in any massive way at this point. But that is ok since the GDS2 and CMIP5 share the same CF-compliant netCDF "backbone" if you will, which already ensures a lot of compatibility and should make our efforts to convert to your forms relatively straightforward.  However, I think we all understand intuitively that when it comes to real data interoperability the devil is definitely in the details.  

> (In general my rule of thumb is organise the data for the consumers, not 
> the providers!)
> 

We do this in GHRSST as well.  Our consumers are many and varied since SST is so broadly used, but their needs are always put first to the extent that we know and understand them.

> Clearly however, most level 2 data is going to be consumed by folks who 
> are far more "satellite-aware" ... there you could be proposing 
> suggested accommodations within the CMIP5 frame (i.e. getting the CMIP5 
> community to extend their protocols,  not change them, there is no 
> chance of that now given the amount of effort being expended worldwide to 
> try and conform with what we have ... the last thing anyone in the 
> modelling community wants is a  moving target for *their* output formats 
> etc).

I didn't think the focus on CMIP5-GHRSST compatibility was really on Level 2 data.   Am I wrong about that?  I thought we were mainly talking about Level 3 (in GHRSST, that means gridded) or Level 4 (that means gridded and gap-filled via some process).

> 
> Why do I think you should still do this in the CMIP5 frame, rather than 
> just do your own thing and expose it somehow to the climate community?
> Because it's not just about the applications at the user end, it's also 
> about the metadata and data distribution systems. If we get it right, 
> we can use ESGF to replicate your data globally, making it easier to 
> consume (even as we provide adequate logging etc so data downloads are 
> attributed to the data provider, no matter where the data is downloaded 
> from). We can also exploit the tools that are being built in the ESG 
> community to manipulate the data ....

We also understand this thinking in GHRSST, where data management - including data format standards (what the containers look like), data content standards (what goes into those containers), metadata standards (how those containers are described), and the data transport standards (how the containers are shipped around the world) - sits at the heart of GHRSST and always has.  GHRSST does not use the Earth System Grid, but relies on a Regional/Global Task Sharing Framework consisting of "regional" data providers called RDACs (Regional Data Assembly Centers... "regional" means that are situated in a region like France or Australia or where but their datasets can be global in scope), who submit their data to a Global Data Assembly Center (GDAC), situated at NASA PO.DAAC and responsible for serving the data for 30 days from observation, which then sends the data to the US NODC (my office), which operated the GHRSST Long Term Stewardship and Reanalysis Facility (LTSRF, the long term archive and distribution center for the entire GHRSST collection).  Data access is enabled at all points along that framework, though ultimately it is consolidated into one location at the US NODC (though of course most RDACs maintain their individual collections).  

> 
> Ok, so taking some more specific points from the emails I have seen:
> 
> scale factors etc. For model data, precision matters because of the 
> necessity to do post processing budget studies. No such argument applies 
> to EO data (especially after being munged to level 3 in some 
> unphysically  interesting way ... it might be important if it were done 
> using a physical reanalysis). But In truth, the volume of EO data in 
> level 3 is going to  be trivial compared to the amount of model data, 
> and most (climate)  folks wont have the code all set up to the scaling 
> offset stuff. Yes it's  trivial to do, but using my rubric about consumers 
> above, I'd  suggest you just  put the data in using the correct physical 
> value with  respect to the CF units. Likewise native floats etc. Don't 
> make it harder  for the consumer .... (including the tools mentioned 
> above).
> 

Agreed it is probably not a huge deal for just L3 products.  I would argue that most netCDF clients I have used understand scale and offset and apply it seamlessly for the user, but I definitely agree to make things easier for the users in anyway you can.    (I gotta say though if volume is a big concern for the model data, which I think you are saying, then the use of scale and offset and can terribly useful and can be applied I believe in a way that preserves your desired precision... could be wrong about that but it doesn't jump out at me as being a big problem).

> CMIP has a number of levels of metadata requirements, including both CF 
> header requirements, and directory layout. Some thoughts on dealing with 
> this for EO (and other observatioinal data)  can be found at 
> https://oodt.jpl.nasa.gov/wiki/display/CLIMATE/Data+and+Metadata+Requirements+for+CMIP5+Observational+Datasets 
> as you have found.

I've read that page closely now a couple of times and it seems to be lacking much detail.  Is there something more specific you can point me to? I'd like to do a closer look at the GHRSST Data Specification for L3 data and do a closer comparison with the CMIP5 spec for EO data.

> It'd be good if you engaged directly with the authors 
> of that page, to make constructive suggesitons about the way forward ... 
> but some of the decisions you don't like (one variable per file etc) are 
> pretty much non-negotiable ... there are good reasons spanning back over 
> years as to why this is done.
> 

In GHRSST we also understand that issue of "historical reasons" very well.  Who are the authors? I see Luca's name on the page, and on this email... anyone else?

> ... but most of the EO side of things are far from cast in stone, so get 
> involved now ... but quickly.
> 
> Hope this is helpful. 
> 

Yes, very!  Thanks,
Ken

> Regards,
> Bryan
> 
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848; 
> Web: home.badc.rl.ac.uk/lawrence

[NOTE: The opinions expressed in this email are those of the author alone and do not necessarily reflect official NOAA, Department of Commerce, or US government policy.]

Kenneth S. Casey, Ph.D.
Technical Director
NOAA National Oceanographic Data Center
1315 East-West Highway
Silver Spring MD 20910 USA
+1 301-713-3272 ext 133
http://www.nodc.noaa.gov/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110318/602774c3/attachment-0001.html