[Go-essp-tech] DRS syntax and TDS identifiers

Thu Sep 2 04:18:40 MDT 2010

Hi folks,

Just for the record we (here at MOHC) have this week started producing
our first batch of CMOR-netCDF files for the CMIP5 archive using the
latest (and, we understand, final) version of the CMOR-2 library.
(Charles Doutriaux at PCMDI has indicated recently that, with the
exception of essential bug fixes, the CMOR-2 library is in effect now
frozen.)

It's not entirely clear to me if the changes to the DRS syntax/structure
being suggested at this *very* late stage in the game will impact on i)
the validity of the CMIP5 netCDF data files that we have just produced,
of which there are many GB, or ii) if this will require corresponding
updates to the CMOR-2 library.

I *think* the answer is 'no' on both counts, i.e. our data will still be
valid and the CMOR-2 library will not need to change. But it would
definitely be reassuring to us if someone on the mailing list could
corroborate this view!

Thanks,
Phil

> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu 
> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Martina Stockhause
> Sent: 02 September 2010 09:54
> To: Bob Drach; GO-ESSP
> Subject: [Go-essp-tech] DRS syntax and TDS identifiers
> 
> Hi Bob, dear all,
> 
> for the QC a defined and stable DRS syntax and a clear 
> mapping of other IDs (TDS and metafor) to the DRS are required.
> 
> The granularity of the quality checks is the atomic dataset. 
> QC level 3
> (STD-DOI) is assigned on the experiment level but includes 
> references to every netcdf file (or chunk) of this 
> experiment. For a correct identification of the files, which 
> belong to the DOI experiment, the DRS definition has to 
> include DRS levels: 'experiment, atomic dataset, netcdf 
> file'. We extract the metadata of the data from the TDS XML. 
> So we have to identify the names of experiment, atomic 
> dataset (aggregation of netcdf files) and netcdf file by the 
> names or identifiers used by the TDS (e.g. with field 
> dataset_ID). Presently, there are differences between them.
> 
> I attached a figure visualizing the differences in DRS and 
> TDS field 'dataset_ID'.
> 
> ----------
> DRS-Syntax: We agreed on moving the <version> from the 
> position behind the atomic dataset to the position behind the 
> realm, i.e.
> 
> cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<
> realm>.<version>.<ensemble>.<variable>.<netcdf>
> 
> Karl, could you please update the DRS document?
> 
> Bob, why does the TDS realm version include the ensemble member?
> 
> cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<
> realm>.<ensemble>.<version>
> 
> So, in the DRS a realm like atmos of an experiment performed 
> with a specific Earth System model has a version, but in the 
> TDS every realization of this experiment (ensemble member) 
> has its own version.
> 
> ----------
> Mapping TDS dataset ID to DRS syntax:
> Down to the realm the TDS ID is identical with the DRS 
> syntax, beneath it not.
> Bob, I need a clear mapping direction for the atomic dataset 
> and chunk levels.
> We figured out the following mapping from our example 
> publication. Can you verify that?
> 
> atomic dataset ID:
> 
> cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<
> realm>.<ensemble>.<variable>.1.aggregation
> 
> Question: Is the '1' identical with the realm.ensemble 
> version 'v1'? If not, what does it mean and where do we find 
> the <version>?
> 
> chunk dataset ID:
> 
> cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<
> realm>.<ensemble>.<version>.<netcdf>
> 
> Question: The <variable> is left out there and can be 
> extracted as first part of the chunk name split by '_'?
> 
> By the way, could you explain, why the TDS dataset IDs are 
> not identical with the DRS syntax any longer?
> 
> If we do not have a clear definition of the DRS syntax and 
> mapping directions between TDS IDs and DRS syntax, we cannot 
> assign DOIs to a distinct set of chunks organized according 
> to the DRS syntax (for which we have the required quality 
> information).
> 
> Alternatively, we would have to scan the netcdf dataheaders 
> and build aggregations for the atomic dataset level again. We 
> would like to avoid that additional effort.
> 
> I need answers and stability in the DRS syntax and TDS IDs 
> soon. I suggest we speak about this on the telco 14th or 21st 
> September.
> 
> 
> Best wishes,
> Martina
>