[Go-essp-tech] comparison of GDS2.0 with climate modellers format CMIP5

Bryan Lawrence bryan.lawrence at stfc.ac.uk
Fri Mar 18 13:31:15 MDT 2011


Hi Folks

I've seen some of your correspondence on the above subject.

Suffice to I think it'd be helpful if this discussion was  conducted on a 
slightly wider stage. To that end, I've copied in the go- essp-tech 
list, where you'll get the folk who have devised the CMIP5 
data standards - and data distiribution system. 

There are some general points that might help in the discussoin to 
follow:

CMOR is absolutely a tool devised for climate model data output, and 
some of the things you find strange (demanding double precisoin etc) are 
absolutely necessary in that context ...

The decision to use NetCDF3 rather than NetCDF4 was taken some time ago 
after much discussion and heartache (and we had just about settled on 
NetCDF4 before we did an "about turn"). In practice, many of the reasons 
we used are probably not now relevant, but we are where we are ... wrt 
the CMIP5 model data! (Which is to say I think there might be room to do 
things differently with the EO data.)
(Incidentally however, the 2 GB limit is helpful in chunking data over 
low bandwidth links ... and we need to deal with between file aggregation 
for many other reasons, so it's not a big deal if we break things up).

Which brings me to this:  The CMIP5 community is working to 
accommodating  EO data, but indeed there are signifcant diffences, and 
many of those are  obviously evident in the relative importance folks 
apply to the various  CF headers. My personal opinion is that the way 
forward is to itemise  the specific difference, and then have a disucssion 
as to why one might do  things in a specific way.

For example, if you want level 3 EO data to be easily useful by the CMIP 
community (and I believe you do), then I suggest you conform as closely 
as you can to the CMIP5 paradigm.  Remember that the CMIP5 protocol is 
already the joint agreement of hundreds of climate modellers ...

(In general my rule of thumb is organise the data for the consumers, not 
the providers!)

Clearly however, most level 2 data is going to be consumed by folks who 
are far more "satellite-aware" ... there you could be proposing 
suggested accommodations within the CMIP5 frame (i.e. getting the CMIP5 
community to extend their protocols,  not change them, there is no 
chance of that now given the amount of effort being expended worldwide to 
try and conform with what we have ... the last thing anyone in the 
modelling community wants is a  moving target for *their* output formats 
etc).

Why do I think you should still do this in the CMIP5 frame, rather than 
just do your own thing and expose it somehow to the climate community?
Because it's not just about the applications at the user end, it's also 
about the metadata and data distribution systems. If we get it right, 
we can use ESGF to replicate your data globally, making it easier to 
consume (even as we provide adequate logging etc so data downloads are 
attributed to the data provider, no matter where the data is downloaded 
from). We can also exploit the tools that are being built in the ESG 
community to manipulate the data ....

Ok, so taking some more specific points from the emails I have seen:

scale factors etc. For model data, precision matters because of the 
necessity to do post processing budget studies. No such argument applies 
to EO data (especially after being munged to level 3 in some 
unphysically  interesting way ... it might be important if it were done 
using a physical reanalysis). But In truth, the volume of EO data in 
level 3 is going to  be trivial compared to the amount of model data, 
and most (climate)  folks wont have the code all set up to the scaling 
offset stuff. Yes it's  trivial to do, but using my rubric about consumers 
above, I'd  suggest you just  put the data in using the correct physical 
value with  respect to the CF units. Likewise native floats etc. Don't 
make it harder  for the consumer .... (including the tools mentioned 
above).

CMIP has a number of levels of metadata requirements, including both CF 
header requirements, and directory layout. Some thoughts on dealing with 
this for EO (and other observatioinal data)  can be found at 
https://oodt.jpl.nasa.gov/wiki/display/CLIMATE/Data+and+Metadata+Requirements+for+CMIP5+Observational+Datasets 
as you have found. It'd be good if you engaged directly with the authors 
of that page, to make constructive suggesitons about the way forward ... 
but some of the decisions you don't like (one variable per file etc) are 
pretty much non-negotiable ... there are good reasons spanning back over 
years as to why this is done.

... but most of the EO side of things are far from cast in stone, so get 
involved now ... but quickly.

Hope this is helpful. 

Regards,
Bryan

--
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence


More information about the GO-ESSP-TECH mailing list