[Go-essp-tech] drs, cmor, realms, and atomic datasets and components
Bob Drach
drach at llnl.gov
Tue Sep 22 13:37:11 MDT 2009
Hi Bryan,
I'll answer from the perspective of publication:
The publisher has only three hardwired components (using the DRS
terminology):
- project (== activity)
- experiment
- model
Each of these must be defined for a given dataset, and there are
tables for describing the permitted values. The remainder of the
components are configured on a per-project basis. So for the CMIP5/
AR5 project, <institute>, <frequency>, <modelling_realm> etc. are
defined specifically for CMIP5. In addition to the static project
configuration, there will be a CMIP5 handler - implemented as a
Python class - that specifies how to look inside a data file,
validate it as a CMIP5 file, and discover any additional metadata not
otherwise determined from directory names and command line arguments.
Within the per-project configuration there is a field 'dataset_id', a
template for construction of dataset identifiers. If for CMIP5 this
is defined similarly to the DRS spec then datasets will correspond to
the DRS definition.
On Sep 22, 2009, at 9:47 AM, Bryan Lawrence wrote:
>
> Hi Folks (probably Luca, Bob primarily)
>
> I'm about to ask some questions, but in order to be very accurate,
> I need some definitions:
>
> An atomic dataset defines a variable from a single model run. The
> breakdown of components
> in a CMIP5 DRS compliant dataset look like ...
> <activity>/<institute>/<model>/<experiment>/<frequency>/<modeling
> realm>/<variable>/<ensemble member>/<version>/[<endpoint>],
>
> I believe CMOR is writing directory hierarchies that look like
> that. For now I'm interested in <modelling_realm> which is a tag
> that comes from the *primary* realm associated with a variable in
> the CMOR tables.
>
> In terms of cataloguing, from Luca's comments, I *think* ESG was
> planning on aggregating these up so that a dataset in their
> catalogue looks like the agregation of all variables in a given
> modelling realm (for a given ensemble member and version), and the
> idea was that one browse between datasets and their modelling realms.
This is certainly do-able, given the comments above.
>
> This is because metafor (and curator) also have the concept of
> modelling realms, and these are the "top level" components within
> the model.
>
> I think there was an assumption that these two uses of
> modelling_realm were the same. As of today they're not quite. I'll
> get back to that.
>
> CMOR also has the concept of secondary realms, that is, one can tag
> a variable with more than one realm.
>
> So the first of my questions:
> 1) Is ESG using those secondary realms at all in the catalogue (or
> planning to do so)?
The plan is to publish all CMOR-generated metadata.
> 2) Do they make it to the catalogue via ESG publisher?
They willl make it into THREDDS catalogs as properties. They can then
be harvested into the gateway database, although I don't believe this
is being done at the moment.
> 3) Is ESG providing wget scripts to get all the data in one of
> their aggregated datasets?
> 4) Is ESG providing a way of getting wget scripts for the
> individual atomic datasets within an aggregated dataset?
>
> Getting back to the difference between modelling realms as seen by
> CMOR and metafor/curator.
>
> 5) Does it matter if there is a slight difference between them. (At
> the moment curator/metafor has aerosols within atmospheric
> chemistry, CMOR has them as distinct primary realms). Either CMOR
> could change or Metafor could change or neither could change, but
> the balance of choosing between these options depends on the
> answers to the five questions above, since both CMOR and metafor/
> curator have valid reasons for the way they have done things).
From the publisher persective it doesn't matter - the CMOR-generated
metadata will be published to the gateway. It's not clear to me if it
matters from a gateway search perspective.
Best regards,
Bob
>
> thanks
> Bryan
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
More information about the GO-ESSP-TECH
mailing list