[Go-essp-tech] comparison of GDS2.0 with climate modellers format CMIP5

Steve Hankin Steven.C.Hankin at noaa.gov
Tue Mar 22 17:24:38 MDT 2011


Greetings Luca!

AOK.  What you describe above is all that I had seen previously, too, 
but I thought there might be other conventions for contents inside of 
the files that I hadn't seen.  Conversely, there may be conventions 
defined in the GHRSST GDS that would be valuable for consideration for 
satellite data prepared for CMIP, too.  (Likely you'd already considered 
those.)

The cross-posting was just to help the process of convergence -- no 
specific agenda.

     - Steve


On 3/22/2011 2:32 PM, Cinquini, Luca (3880) wrote:
> Hi Steve,
>          I may be wrong, but I think this discussion concerns more how to structure observations for inclusion into ESG/CMIP5, rather than CF conventions for satellite data. So far, we have used existing CF conventions and not introduced anything new. I think there are ideas that could lead to an extensions of the CF spec for satellites, but these ideas have not been refined yet... they might after the first round of observational datasets is produced.
>
> thanks, Luca
>
> On Mar 22, 2011, at 11:53 AM, Steve Hankin wrote:
>
>> Hello,
>>
>> I have cross-posted this important discussion to the cf_satellite emai
>> list.  Has a critical mass been reached, sufficient to pull together a
>> formal proposal for satellite extensions to CF at
>> http://cf-pcmdi.llnl.gov/discussion?
>>
>>      - Steve
>>
>> =========================================================
>>
>> On 3/22/2011 8:25 AM, Cinquini, Luca (3880) wrote:
>>> Hi Ken,
>>>       there's actually quite an active group of people from different institutions (including people from NOAA) that have been working for several months to agree on conventions for data and metadata formats to use when preparing observational datasets for CMIP5. We have a mailing list setup (climate-obs<climate-obs at jpl.nasa.gov>) which we have been using to reach consensus (please let me know if you would like me to subscribe you). Right now, we are very close to have the first few datasets ready: the ARM dataset from ORNL, which PCMDI is helping to prepare, and the AIRS and MLS datasets from NASA/JPL.
>>>
>>> We are documenting our activities on this wiki:
>>>
>>> https://oodt.jpl.nasa.gov/wiki/display/CLIMATE/Sharing+Observations+for+Climate+Research
>>>
>>> where we intend to post examples and templates as soon as they are ready.
>>>
>>> In particular, you might be interested in the metadata conventions page:
>>>
>>> https://oodt.jpl.nasa.gov/wiki/display/CLIMATE/Data+and+Metadata+Requirements+for+CMIP5+Observational+Datasets
>>>
>>> and in the (just started) controlled vocabulary page:
>>>
>>> https://oodt.jpl.nasa.gov/wiki/display/CLIMATE/CMIP5+Controlled+Vocabulary+for+Observations
>>>
>>> Please send any questions you might have... the idea is to reach consensus across agencies so that all observations are structured similarly, and as close as possible to models, to maximize their usefulness...
>>>
>>> thanks, Luca
>>>
>>> On Mar 22, 2011, at 9:02 AM, Bryan Lawrence wrote:
>>>
>>>> Hi Ken
>>>>
>>>> Just to note that I'm waiting to see if anyone else on the go-essp-tech
>>>> list picks up on this. If they don't, I will, but my attention span is
>>>> somewhat intermittent (too many balls in the air), so I'd feel happier
>>>> if you got someone paying attention who would stay the course :-)
>>>>
>>>> However, if you're waiting on answers, feel free to hassle me!
>>>>
>>>> Cheers
>>>> Bryan
>>>>
>>>>> Hi Bryan,
>>>>>
>>>>> I am definitely on board with your approach!  And we definitely all
>>>>> want to ensure GHRSST L3 products (or some subset of them) are
>>>>> available to CMIP5.  A couple more comments below...
>>>>>
>>>>> On Mar 18, 2011, at 3:31 PM, Bryan Lawrence wrote:
>>>>>> Hi Folks
>>>>>>
>>>>>> I've seen some of your correspondence on the above subject.
>>>>>>
>>>>>> Suffice to I think it'd be helpful if this discussion was
>>>>>> conducted on a slightly wider stage. To that end, I've copied in
>>>>>> the go- essp-tech list, where you'll get the folk who have devised
>>>>>> the CMIP5 data standards - and data distiribution system.
>>>>>>
>>>>>> There are some general points that might help in the discussoin to
>>>>>> follow:
>>>>>>
>>>>>> CMOR is absolutely a tool devised for climate model data output,
>>>>>> and some of the things you find strange (demanding double
>>>>>> precisoin etc) are absolutely necessary in that context ...
>>>>>>
>>>>>> The decision to use NetCDF3 rather than NetCDF4 was taken some time
>>>>>> ago after much discussion and heartache (and we had just about
>>>>>> settled on NetCDF4 before we did an "about turn"). In practice,
>>>>>> many of the reasons we used are probably not now relevant, but we
>>>>>> are where we are ... wrt the CMIP5 model data! (Which is to say I
>>>>>> think there might be room to do things differently with the EO
>>>>>> data.)
>>>>>> (Incidentally however, the 2 GB limit is helpful in chunking data
>>>>>> over low bandwidth links ... and we need to deal with between file
>>>>>> aggregation for many other reasons, so it's not a big deal if we
>>>>>> break things up).
>>>>> It is not a huge deal to stick with netCDF-3, especially given some
>>>>> of the other choices you've made like limiting to single variable
>>>>> files.  We've worked extensively with netCDF-3, but at US NODC have
>>>>> been focussed a lot lately on netCDF-4 and the performance aspects
>>>>> are especially useful for large, multi-variable files.  Did I read
>>>>> somewhere that CMIP5 prefers monthly, one-degree resolution EO data?
>>>>> I do see that your directory and file name structures handle other
>>>>> frequencies so maybe I am wrong about that monthly, one-degree part.
>>>>>
>>>>>> Which brings me to this:  The CMIP5 community is working to
>>>>>> accommodating  EO data, but indeed there are signifcant diffences,
>>>>>> and many of those are  obviously evident in the relative
>>>>>> importance folks apply to the various  CF headers. My personal
>>>>>> opinion is that the way forward is to itemise  the specific
>>>>>> difference, and then have a disucssion as to why one might do
>>>>>> things in a specific way.
>>>>> That is exactly the approach I'd like to take too!
>>>>>
>>>>>> For example, if you want level 3 EO data to be easily useful by the
>>>>>> CMIP community (and I believe you do), then I suggest you conform
>>>>>> as closely as you can to the CMIP5 paradigm.  Remember that the
>>>>>> CMIP5 protocol is already the joint agreement of hundreds of
>>>>>> climate modellers ...
>>>>> Yes, we understand that concept in GHRSST.  The GHRSST Data
>>>>> Specification v2.0 (GDS2) is the result of a multi-year effort of
>>>>> lots (maybe hundreds, or at least one hundred) SST data providers
>>>>> and users.  It's been officially published and while it does have a
>>>>> routine update cycle, it can't be changed in any massive way at this
>>>>> point. But that is ok since the GDS2 and CMIP5 share the same
>>>>> CF-compliant netCDF "backbone" if you will, which already ensures a
>>>>> lot of compatibility and should make our efforts to convert to your
>>>>> forms relatively straightforward.  However, I think we all
>>>>> understand intuitively that when it comes to real data
>>>>> interoperability the devil is definitely in the details.
>>>>>
>>>>>> (In general my rule of thumb is organise the data for the
>>>>>> consumers, not the providers!)
>>>>> We do this in GHRSST as well.  Our consumers are many and varied
>>>>> since SST is so broadly used, but their needs are always put first
>>>>> to the extent that we know and understand them.
>>>>>
>>>>>> Clearly however, most level 2 data is going to be consumed by folks
>>>>>> who are far more "satellite-aware" ... there you could be
>>>>>> proposing suggested accommodations within the CMIP5 frame (i.e.
>>>>>> getting the CMIP5 community to extend their protocols,  not change
>>>>>> them, there is no chance of that now given the amount of effort
>>>>>> being expended worldwide to try and conform with what we have ...
>>>>>> the last thing anyone in the modelling community wants is a
>>>>>> moving target for *their* output formats etc).
>>>>> I didn't think the focus on CMIP5-GHRSST compatibility was really on
>>>>> Level 2 data.   Am I wrong about that?  I thought we were mainly
>>>>> talking about Level 3 (in GHRSST, that means gridded) or Level 4
>>>>> (that means gridded and gap-filled via some process).
>>>>>
>>>>>> Why do I think you should still do this in the CMIP5 frame, rather
>>>>>> than just do your own thing and expose it somehow to the climate
>>>>>> community? Because it's not just about the applications at the
>>>>>> user end, it's also about the metadata and data distribution
>>>>>> systems. If we get it right, we can use ESGF to replicate your
>>>>>> data globally, making it easier to consume (even as we provide
>>>>>> adequate logging etc so data downloads are attributed to the data
>>>>>> provider, no matter where the data is downloaded from). We can
>>>>>> also exploit the tools that are being built in the ESG community
>>>>>> to manipulate the data ....
>>>>> We also understand this thinking in GHRSST, where data management -
>>>>> including data format standards (what the containers look like),
>>>>> data content standards (what goes into those containers), metadata
>>>>> standards (how those containers are described), and the data
>>>>> transport standards (how the containers are shipped around the
>>>>> world) - sits at the heart of GHRSST and always has.  GHRSST does
>>>>> not use the Earth System Grid, but relies on a Regional/Global Task
>>>>> Sharing Framework consisting of "regional" data providers called
>>>>> RDACs (Regional Data Assembly Centers... "regional" means that are
>>>>> situated in a region like France or Australia or where but their
>>>>> datasets can be global in scope), who submit their data to a Global
>>>>> Data Assembly Center (GDAC), situated at NASA PO.DAAC and
>>>>> responsible for serving the data for 30 days from observation, which
>>>>> then sends the data to the US NODC (my office), which operated the
>>>>> GHRSST Long Term Stewardship and Reanalysis Facility (LTSRF, the
>>>>> long term archive and distribution center for the entire GHRSST
>>>>> collection).  Data access is enabled at all points along that
>>>>> framework, though ultimately it is consolidated into one location at
>>>>> the US NODC (though of course most RDACs maintain their individual
>>>>> collections).
>>>>>
>>>>>> Ok, so taking some more specific points from the emails I have
>>>>>> seen:
>>>>>>
>>>>>> scale factors etc. For model data, precision matters because of the
>>>>>> necessity to do post processing budget studies. No such argument
>>>>>> applies to EO data (especially after being munged to level 3 in
>>>>>> some unphysically  interesting way ... it might be important if it
>>>>>> were done using a physical reanalysis). But In truth, the volume
>>>>>> of EO data in level 3 is going to  be trivial compared to the
>>>>>> amount of model data, and most (climate)  folks wont have the code
>>>>>> all set up to the scaling offset stuff. Yes it's  trivial to do,
>>>>>> but using my rubric about consumers above, I'd  suggest you just
>>>>>> put the data in using the correct physical value with  respect to
>>>>>> the CF units. Likewise native floats etc. Don't make it harder
>>>>>> for the consumer .... (including the tools mentioned above).
>>>>> Agreed it is probably not a huge deal for just L3 products.  I would
>>>>> argue that most netCDF clients I have used understand scale and
>>>>> offset and apply it seamlessly for the user, but I definitely agree
>>>>> to make things easier for the users in anyway you can.    (I gotta
>>>>> say though if volume is a big concern for the model data, which I
>>>>> think you are saying, then the use of scale and offset and can
>>>>> terribly useful and can be applied I believe in a way that preserves
>>>>> your desired precision... could be wrong about that but it doesn't
>>>>> jump out at me as being a big problem).
>>>>>
>>>>>> CMIP has a number of levels of metadata requirements, including
>>>>>> both CF header requirements, and directory layout. Some thoughts
>>>>>> on dealing with this for EO (and other observatioinal data)  can
>>>>>> be found at
>>>>>> https://oodt.jpl.nasa.gov/wiki/display/CLIMATE/Data+and+Metadata+R
>>>>>> equirements+for+CMIP5+Observational+Datasets as you have found.
>>>>> I've read that page closely now a couple of times and it seems to be
>>>>> lacking much detail.  Is there something more specific you can point
>>>>> me to? I'd like to do a closer look at the GHRSST Data Specification
>>>>> for L3 data and do a closer comparison with the CMIP5 spec for EO
>>>>> data.
>>>>>
>>>>>> It'd be good if you engaged directly with the authors
>>>>>> of that page, to make constructive suggesitons about the way
>>>>>> forward ... but some of the decisions you don't like (one variable
>>>>>> per file etc) are pretty much non-negotiable ... there are good
>>>>>> reasons spanning back over years as to why this is done.
>>>>> In GHRSST we also understand that issue of "historical reasons" very
>>>>> well.  Who are the authors? I see Luca's name on the page, and on
>>>>> this email... anyone else?
>>>>>
>>>>>> ... but most of the EO side of things are far from cast in stone,
>>>>>> so get involved now ... but quickly.
>>>>>>
>>>>>> Hope this is helpful.
>>>>> Yes, very!  Thanks,
>>>>> Ken
>>>>>
>>>>>> Regards,
>>>>>> Bryan
>>>>>>
>>>>>> --
>>>>>> Bryan Lawrence
>>>>>> Director of Environmental Archival and Associated Research
>>>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>>>>>> STFC, Rutherford Appleton Laboratory
>>>>>> Phone +44 1235 445012; Fax ... 5848;
>>>>>> Web: home.badc.rl.ac.uk/lawrence
>>>>> [NOTE: The opinions expressed in this email are those of the author
>>>>> alone and do not necessarily reflect official NOAA, Department of
>>>>> Commerce, or US government policy.]
>>>>>
>>>>> Kenneth S. Casey, Ph.D.
>>>>> Technical Director
>>>>> NOAA National Oceanographic Data Center
>>>>> 1315 East-West Highway
>>>>> Silver Spring MD 20910 USA
>>>>> +1 301-713-3272 ext 133
>>>>> http://www.nodc.noaa.gov/
>>>> --
>>>> Bryan Lawrence
>>>> Director of Environmental Archival and Associated Research
>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>>>> STFC, Rutherford Appleton Laboratory
>>>> Phone +44 1235 445012; Fax ... 5848;
>>>> Web: home.badc.rl.ac.uk/lawrence
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech


More information about the GO-ESSP-TECH mailing list