[Go-essp-tech] comparison of GDS2.0 with climate modellers format CMIP5

Bryan Lawrence bryan.lawrence at stfc.ac.uk
Tue Mar 22 09:02:04 MDT 2011


Hi Ken

Just to note that I'm waiting to see if anyone else on the go-essp-tech 
list picks up on this. If they don't, I will, but my attention span is 
somewhat intermittent (too many balls in the air), so I'd feel happier 
if you got someone paying attention who would stay the course :-)

However, if you're waiting on answers, feel free to hassle me!

Cheers
Bryan

> Hi Bryan,
> 
> I am definitely on board with your approach!  And we definitely all
> want to ensure GHRSST L3 products (or some subset of them) are
> available to CMIP5.  A couple more comments below...
> 
> On Mar 18, 2011, at 3:31 PM, Bryan Lawrence wrote:
> > Hi Folks
> > 
> > I've seen some of your correspondence on the above subject.
> > 
> > Suffice to I think it'd be helpful if this discussion was 
> > conducted on a slightly wider stage. To that end, I've copied in
> > the go- essp-tech list, where you'll get the folk who have devised
> > the CMIP5 data standards - and data distiribution system.
> > 
> > There are some general points that might help in the discussoin to
> > follow:
> > 
> > CMOR is absolutely a tool devised for climate model data output,
> > and some of the things you find strange (demanding double
> > precisoin etc) are absolutely necessary in that context ...
> > 
> > The decision to use NetCDF3 rather than NetCDF4 was taken some time
> > ago after much discussion and heartache (and we had just about
> > settled on NetCDF4 before we did an "about turn"). In practice,
> > many of the reasons we used are probably not now relevant, but we
> > are where we are ... wrt the CMIP5 model data! (Which is to say I
> > think there might be room to do things differently with the EO
> > data.)
> > (Incidentally however, the 2 GB limit is helpful in chunking data
> > over low bandwidth links ... and we need to deal with between file
> > aggregation for many other reasons, so it's not a big deal if we
> > break things up).
> 
> It is not a huge deal to stick with netCDF-3, especially given some
> of the other choices you've made like limiting to single variable
> files.  We've worked extensively with netCDF-3, but at US NODC have
> been focussed a lot lately on netCDF-4 and the performance aspects
> are especially useful for large, multi-variable files.  Did I read
> somewhere that CMIP5 prefers monthly, one-degree resolution EO data?
>  I do see that your directory and file name structures handle other
> frequencies so maybe I am wrong about that monthly, one-degree part.
> 
> > Which brings me to this:  The CMIP5 community is working to
> > accommodating  EO data, but indeed there are signifcant diffences,
> > and many of those are  obviously evident in the relative
> > importance folks apply to the various  CF headers. My personal
> > opinion is that the way forward is to itemise  the specific
> > difference, and then have a disucssion as to why one might do 
> > things in a specific way.
> 
> That is exactly the approach I'd like to take too!
> 
> > For example, if you want level 3 EO data to be easily useful by the
> > CMIP community (and I believe you do), then I suggest you conform
> > as closely as you can to the CMIP5 paradigm.  Remember that the
> > CMIP5 protocol is already the joint agreement of hundreds of
> > climate modellers ...
> 
> Yes, we understand that concept in GHRSST.  The GHRSST Data
> Specification v2.0 (GDS2) is the result of a multi-year effort of
> lots (maybe hundreds, or at least one hundred) SST data providers
> and users.  It's been officially published and while it does have a
> routine update cycle, it can't be changed in any massive way at this
> point. But that is ok since the GDS2 and CMIP5 share the same
> CF-compliant netCDF "backbone" if you will, which already ensures a
> lot of compatibility and should make our efforts to convert to your
> forms relatively straightforward.  However, I think we all
> understand intuitively that when it comes to real data
> interoperability the devil is definitely in the details.
> 
> > (In general my rule of thumb is organise the data for the
> > consumers, not the providers!)
> 
> We do this in GHRSST as well.  Our consumers are many and varied
> since SST is so broadly used, but their needs are always put first
> to the extent that we know and understand them.
> 
> > Clearly however, most level 2 data is going to be consumed by folks
> > who are far more "satellite-aware" ... there you could be
> > proposing suggested accommodations within the CMIP5 frame (i.e.
> > getting the CMIP5 community to extend their protocols,  not change
> > them, there is no chance of that now given the amount of effort
> > being expended worldwide to try and conform with what we have ...
> > the last thing anyone in the modelling community wants is a 
> > moving target for *their* output formats etc).
> 
> I didn't think the focus on CMIP5-GHRSST compatibility was really on
> Level 2 data.   Am I wrong about that?  I thought we were mainly
> talking about Level 3 (in GHRSST, that means gridded) or Level 4
> (that means gridded and gap-filled via some process).
> 
> > Why do I think you should still do this in the CMIP5 frame, rather
> > than just do your own thing and expose it somehow to the climate
> > community? Because it's not just about the applications at the
> > user end, it's also about the metadata and data distribution
> > systems. If we get it right, we can use ESGF to replicate your
> > data globally, making it easier to consume (even as we provide
> > adequate logging etc so data downloads are attributed to the data
> > provider, no matter where the data is downloaded from). We can
> > also exploit the tools that are being built in the ESG community
> > to manipulate the data ....
> 
> We also understand this thinking in GHRSST, where data management -
> including data format standards (what the containers look like),
> data content standards (what goes into those containers), metadata
> standards (how those containers are described), and the data
> transport standards (how the containers are shipped around the
> world) - sits at the heart of GHRSST and always has.  GHRSST does
> not use the Earth System Grid, but relies on a Regional/Global Task
> Sharing Framework consisting of "regional" data providers called
> RDACs (Regional Data Assembly Centers... "regional" means that are
> situated in a region like France or Australia or where but their
> datasets can be global in scope), who submit their data to a Global
> Data Assembly Center (GDAC), situated at NASA PO.DAAC and
> responsible for serving the data for 30 days from observation, which
> then sends the data to the US NODC (my office), which operated the
> GHRSST Long Term Stewardship and Reanalysis Facility (LTSRF, the
> long term archive and distribution center for the entire GHRSST
> collection).  Data access is enabled at all points along that
> framework, though ultimately it is consolidated into one location at
> the US NODC (though of course most RDACs maintain their individual
> collections).
> 
> > Ok, so taking some more specific points from the emails I have
> > seen:
> > 
> > scale factors etc. For model data, precision matters because of the
> > necessity to do post processing budget studies. No such argument
> > applies to EO data (especially after being munged to level 3 in
> > some unphysically  interesting way ... it might be important if it
> > were done using a physical reanalysis). But In truth, the volume
> > of EO data in level 3 is going to  be trivial compared to the
> > amount of model data, and most (climate)  folks wont have the code
> > all set up to the scaling offset stuff. Yes it's  trivial to do,
> > but using my rubric about consumers above, I'd  suggest you just 
> > put the data in using the correct physical value with  respect to
> > the CF units. Likewise native floats etc. Don't make it harder 
> > for the consumer .... (including the tools mentioned above).
> 
> Agreed it is probably not a huge deal for just L3 products.  I would
> argue that most netCDF clients I have used understand scale and
> offset and apply it seamlessly for the user, but I definitely agree
> to make things easier for the users in anyway you can.    (I gotta
> say though if volume is a big concern for the model data, which I
> think you are saying, then the use of scale and offset and can
> terribly useful and can be applied I believe in a way that preserves
> your desired precision... could be wrong about that but it doesn't
> jump out at me as being a big problem).
> 
> > CMIP has a number of levels of metadata requirements, including
> > both CF header requirements, and directory layout. Some thoughts
> > on dealing with this for EO (and other observatioinal data)  can
> > be found at
> > https://oodt.jpl.nasa.gov/wiki/display/CLIMATE/Data+and+Metadata+R
> > equirements+for+CMIP5+Observational+Datasets as you have found.
> 
> I've read that page closely now a couple of times and it seems to be
> lacking much detail.  Is there something more specific you can point
> me to? I'd like to do a closer look at the GHRSST Data Specification
> for L3 data and do a closer comparison with the CMIP5 spec for EO
> data.
> 
> > It'd be good if you engaged directly with the authors
> > of that page, to make constructive suggesitons about the way
> > forward ... but some of the decisions you don't like (one variable
> > per file etc) are pretty much non-negotiable ... there are good
> > reasons spanning back over years as to why this is done.
> 
> In GHRSST we also understand that issue of "historical reasons" very
> well.  Who are the authors? I see Luca's name on the page, and on
> this email... anyone else?
> 
> > ... but most of the EO side of things are far from cast in stone,
> > so get involved now ... but quickly.
> > 
> > Hope this is helpful.
> 
> Yes, very!  Thanks,
> Ken
> 
> > Regards,
> > Bryan
> > 
> > --
> > Bryan Lawrence
> > Director of Environmental Archival and Associated Research
> > (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> > STFC, Rutherford Appleton Laboratory
> > Phone +44 1235 445012; Fax ... 5848;
> > Web: home.badc.rl.ac.uk/lawrence
> 
> [NOTE: The opinions expressed in this email are those of the author
> alone and do not necessarily reflect official NOAA, Department of
> Commerce, or US government policy.]
> 
> Kenneth S. Casey, Ph.D.
> Technical Director
> NOAA National Oceanographic Data Center
> 1315 East-West Highway
> Silver Spring MD 20910 USA
> +1 301-713-3272 ext 133
> http://www.nodc.noaa.gov/

--
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence


More information about the GO-ESSP-TECH mailing list