[Go-essp-tech] drs, cmor, realms, and atomic datasets and components

Bob Drach drach at llnl.gov
Tue Sep 22 13:37:11 MDT 2009


Hi Bryan,

I'll answer from the perspective of publication:

The publisher has only three hardwired components (using the DRS  
terminology):

- project (== activity)
- experiment
- model

Each of these must be defined for a given dataset, and there are  
tables for describing the permitted values. The remainder of the  
components are configured on a per-project basis. So for the CMIP5/ 
AR5 project, <institute>, <frequency>, <modelling_realm> etc. are  
defined specifically for CMIP5. In addition to the static project  
configuration, there will be a CMIP5 handler - implemented as a  
Python class - that specifies how to look inside a data file,  
validate it as a CMIP5 file, and discover any additional metadata not  
otherwise determined from directory names and command line arguments.

Within the per-project configuration there is a field 'dataset_id', a  
template for construction of dataset identifiers. If for CMIP5 this  
is defined similarly to the DRS spec then datasets will correspond to  
the DRS definition.

On Sep 22, 2009, at 9:47 AM, Bryan Lawrence wrote:

>
> Hi Folks (probably Luca, Bob primarily)
>
> I'm about to ask some questions, but in order to be very accurate,  
> I need some definitions:
>
> An atomic dataset defines a variable from a single model run. The  
> breakdown of components
> in a CMIP5 DRS compliant dataset look like ...
> <activity>/<institute>/<model>/<experiment>/<frequency>/<modeling
> realm>/<variable>/<ensemble member>/<version>/[<endpoint>],
>
> I believe CMOR is writing directory hierarchies that look like  
> that. For now I'm interested in <modelling_realm> which is a tag  
> that comes from the *primary* realm associated with a variable in  
> the CMOR tables.
>
> In terms of cataloguing, from Luca's comments, I *think* ESG was  
> planning on aggregating these up so that a dataset in their  
> catalogue looks like the agregation of all variables in a given  
> modelling realm (for a given ensemble member and version), and the  
> idea was that one browse between datasets and their modelling realms.

This is certainly do-able, given the comments above.

>
> This is because metafor (and curator) also have the concept of  
> modelling realms, and these are the "top level" components within  
> the model.
>
> I think there was an assumption that these two uses of  
> modelling_realm were the same. As of today they're not quite. I'll  
> get back to that.
>
> CMOR also has the concept of secondary realms, that is, one can tag  
> a variable with more than one realm.
>
> So the first of my questions:
> 1) Is ESG using those secondary realms at all in the catalogue (or  
> planning to do so)?

The plan is to publish all CMOR-generated metadata.

> 2) Do they make it to the catalogue via ESG publisher?

They willl make it into THREDDS catalogs as properties. They can then  
be harvested into the gateway database, although I don't believe this  
is being done at the moment.

> 3) Is ESG providing wget scripts to get all the data in one of  
> their aggregated datasets?
> 4) Is ESG providing a way of getting wget scripts for the  
> individual atomic datasets within an aggregated dataset?
>
> Getting back to the difference between modelling realms as seen by  
> CMOR and metafor/curator.
>
> 5) Does it matter if there is a slight difference between them. (At  
> the moment curator/metafor has aerosols within atmospheric  
> chemistry, CMOR has them as distinct primary realms). Either CMOR  
> could change or Metafor could change or neither could change, but  
> the balance of choosing between these options depends on the  
> answers to the five questions above, since both CMOR and metafor/ 
> curator have valid reasons for the way they have done things).

 From the publisher persective it doesn't matter - the CMOR-generated  
metadata will be published to the gateway. It's not clear to me if it  
matters from a gateway search perspective.

Best regards,

Bob

>
> thanks
> Bryan
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -- 
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence



More information about the GO-ESSP-TECH mailing list