[Go-essp-tech] DRS structure
Martina Stockhause
martina.stockhause at zmaw.de
Thu Aug 26 00:18:54 MDT 2010
Good Morning, Stephen, hi, Bob,
what about the position of <ensemble> in the DRS syntax?
Does it move behind <realm>? Or behind <realm>.<version>?
I.e.
cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<realm>.<version>.<ensemble>.<variable>.<netcdf>
or
cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<realm>.<ensemble>.<version>.<variable>.<netcdf>
Or do we leave it after <variable> as documented in
http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf?
I.e.
cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<realm>.<version>.<variable>.<ensemble>.<netcdf>
It is important for me to know the position of <variable> (atomic
dataset) relative to the <experiment> and the <netcdf> (chunks).
Thanks a lot,
Martina
stephen.pascoe at stfc.ac.uk wrote:
> Hi Martina,
>
> Which TDS server are you working with? Is it one at DKRZ? Everything
> below is based on what we've been doing at BADC with the CMIP3 dataset
> and MOHC's CMIP5 test data.
>
>
>> Example: I get a QC result for the atomic dataset in the directory
>>
> CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr
>
>> How do I find the TDS ID for it?
>>
>>
> ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggregation"
>
> urlPath="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggrega
> tion"
>
>> Is this the structure, how it will remain? Then I can cut the last
>>
> two.
>
> The directory should be part of the dataset with
> dataset_id="cmip5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos". There
> will be 1 or more versions of that dataset with THREDDS catalogue names
>
> cmip5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v1.xml
> cmip5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v2.xml
> etc.
>
> Within each catalogue there is a dataset element for the realm-dataset
> containing a dataset element for each file.
> I'm not sure you can use the aggregation datasets to represent
> atomic-datasets. To be honest I haven't looked at them in detail.
>
>
>> Will the directory structure change to move the version behind the
>>
> realm as well? In my example:
>
>> CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/v1/pr
>>
>
> Yes, the BADC datanode doesn't have this at the moment for CMIP3 data
> because it would be timeconsuming to change after the fact. However,
> our UKMO test runs are putting the version directory where you say.
>
> I guess this only answers part of your questions but I hope it helps.
>
> S.
>
> ---
> Stephen Pascoe +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
>
> -----Original Message-----
> From: Martina Stockhause [mailto:martina.stockhause at zmaw.de]
> Sent: 24 August 2010 10:54
> To: Pascoe, Stephen (STFC,RAL,SSTD)
> Cc: estanislao.gonzalez at zmaw.de; drach at llnl.gov; go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] DRS structure
>
> Hi, Stephen,
>
> I'd like to stay with the TDS XML if I can, because there are a lot of
> open issues in the QC workflow. Or you convince me that I find more
> suitable information in the postgres db.
>
> Example: I get a QC result for the atomic dataset in the directory
> CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr
>
> How do I find the TDS ID for it?
> ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggregation"
> urlPath="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggrega
> tion"
> Is this the structure, how it will remain? Then I can cut the last two.
>
> And all information of the datasets belonging to the atomic dataset?
> ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v1.pr_Amon_ECHAM6
> -MPIOM-LR_rcp45_r1_195501-199412.nc"
> urlPath="atmos/CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr/r1/
> pr_Amon_ECHAM6-MPIOM-LR_rcp45_r1_195501-199412.nc"
> There I can identify the datasets belonging to this atomic dataset using
> the urlPath.
>
> Will the directory structure change to move the version behind the realm
> as well? In my example:
> CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/v1/pr
>
> Thanks a lot in advance to clearify that.
> Best wishes,
> Martina
>
>
> stephen.pascoe at stfc.ac.uk wrote:
>
>> Hi Martina,
>>
>> For efficiency reasons we need to publish multiple variables as one
>>
> dataset, therefore the dataset_id won't contain a variable identifier.
> However, the variable names are contained in the THREDDS XML inside
> metadata/variable tags so they are still available.
>
>> I would recommend you don't rely on the syntax of dataset_ids. Either
>>
> get the DRS attributes from THREDDS <property> elements or bypass the
> XML entirely and inspect the publisher database. The THREDDS properties
> are tied directly to the DRS attributes CMOR creates so they will be
> much less likely to be wrong due to missconfiguration.
>
>> I know the DRS document is out of date but the syntax should be stable
>>
> -- We'll sort out the confusion Estani has just pointed out ASAP.
>
>> Cheers,
>> Stephen.
>>
>> -----Original Message-----
>> From: Martina Stockhause [mailto:martina.stockhause at zmaw.de]
>> Sent: Tue 8/24/2010 7:33 AM
>> To: Estanislao Gonzalez
>> Cc: Bob Drach; Pascoe, Stephen (STFC,RAL,SSTD); go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] DRS structure
>>
>> Dear all,
>>
>> we really need to fix the DRS structure and the reflectance of the DRS
>>
>
>
>> syntax in the TDS catalogue.
>>
>> During the QC, which runs in the file system with DRS syntax, I need
>> to have a connection to the TDS to check the consistency of data
>> against metadata after the automated checks. Since I don't want to
>> touch each dataset again, I take the TDS metadata as reference for
>>
> data content.
>
>> Up to now it was possible to take the dataset_id as DRS name out of
>> the TDS in the atomic dataset (TDS aggregation = QC result level) and
>> the netcdf file level.
>>
>> Now the <variable> part of the DRS is missing in the dataset_id of the
>>
>
>
>> netcdf file, so that I am about to take the urlPath instead.
>> Is that ok?
>>
>> Why can't we use the DRS syntax as IDs in the TDS and in metafor? That
>>
>
>
>> would make things much easier.
>>
>> The DRS syntax is my connection from the QC checked files to the TDS
>> and to metafor. Therefore the DRS syntax should be fixed soon and
>> documented in the DRS document. So, that we can start to adapt our
>> examples and scripts.
>>
>> Best wishes,
>> Martina
>>
>>
>> Estanislao Gonzalez wrote:
>>
>>
>>> Hi all,
>>>
>>> I've realized we've been moving things from one place to another
>>> regarding the DRS components, and the DRS Reference Syntax document
>>> (from 7/4/2010) does not reflect this changes.
>>>
>>> There are two major difference here:
>>> 1) versioning: the drslib tool is creating a structure which is, for
>>> the time being, not drs conform. I totally agree with the new
>>> version-component placement, but should that not be reflected in the
>>> DRS syntax document?
>>> 2) in CMIP5 Best Practices for Data Publication stays that the
>>> dataset_id should be:
>>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<re
>>> alm>.<ensemble> I know the dataset_id is not required to necessary
>>> match any drs structure. But I personally think we should avoid
>>> drs-similar identifiers, as IMHO it increases confusion.
>>> I think this solution helps solving some publishing problems, but
>>> defines a new dataset level, the "ensemble dataset". And the
>>> realm-dataset is not being used anywhere else (or am I missing
>>> something?)
>>>
>>> I'm not aware of the reasons behind the definition of the DRS
>>> structure as it currently is. But I think, we should avoid drifting
>>> away from that document. In any case the document should be updated
>>>
> first.
>
>>> If I try to join all changes and proposals I've heard of, AFAIC the
>>> DRS structure we are going to appears to look something like:
>>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<re
>>> alm>.<ensemble>.<version>.<variable>
>>>
>>> Which is different from the original:
>>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<re
>>> alm>.<variable>.<ensemble>.<version>
>>>
>>> Can anyone with more knowledge on the subject comment on this?
>>>
>>> Thanks,
>>> Estani
>>>
>>>
>>>
>>>
>>
>>
>
> --
> ----------- DKRZ / Data Management -----------
>
> Martina Stockhause
> Deutsches Klimarechenzentrum
> Bundesstr. 45a
> D-20146 Hamburg
> Germany
>
> phone: +49-40-460094-122
> FAX: +49-40-460094-106
> e-mail: martina.stockhause at zmaw.de
>
> ----------------------------------------------
>
>
--
----------- DKRZ / Data Management -----------
Martina Stockhause
Deutsches Klimarechenzentrum
Bundesstr. 45a
D-20146 Hamburg
Germany
phone: +49-40-460094-122
FAX: +49-40-460094-106
e-mail: martina.stockhause at zmaw.de
----------------------------------------------
More information about the GO-ESSP-TECH
mailing list