[Go-essp-tech] DRS structure

Martina Stockhause martina.stockhause at zmaw.de
Thu Aug 26 00:18:54 MDT 2010


Good Morning, Stephen, hi, Bob,

what about the position of <ensemble> in the DRS syntax?

Does it move behind <realm>? Or behind <realm>.<version>?
I.e.
cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<realm>.<version>.<ensemble>.<variable>.<netcdf>
or
cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<realm>.<ensemble>.<version>.<variable>.<netcdf>

Or do we leave it after <variable> as documented in
http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf?
I.e.
cmip5.<product>.<institute>.<model>.<experiment>.<frequency>.<realm>.<version>.<variable>.<ensemble>.<netcdf>

It is important for me to know the position of <variable> (atomic
dataset) relative to the <experiment> and the <netcdf> (chunks).

Thanks a lot,
Martina


stephen.pascoe at stfc.ac.uk wrote:
> Hi Martina,
>
> Which TDS server are you working with?  Is it one at DKRZ?  Everything
> below is based on what we've been doing at BADC with the CMIP3 dataset
> and MOHC's CMIP5 test data.
>
>   
>> Example: I get a QC result for the atomic dataset in the directory
>>     
> CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr 
>   
>> How do I find the TDS ID for it?
>>
>>     
> ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggregation"
>   
> urlPath="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggrega
> tion"
>   
>> Is this the structure, how it will remain? Then I can cut the last
>>     
> two.
>
> The directory should be part of the dataset with
> dataset_id="cmip5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos".  There
> will be 1 or more versions of that dataset with THREDDS catalogue names 
>
>   cmip5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v1.xml
>   cmip5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v2.xml
>   etc.
>
> Within each catalogue there is a dataset element for the realm-dataset
> containing a dataset element for each file.
> I'm not sure you can use the aggregation datasets to represent
> atomic-datasets.  To be honest I haven't looked at them in detail.
>
>   
>> Will the directory structure change to move the version behind the
>>     
> realm as well? In my example:
>   
>> CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/v1/pr
>>     
>
> Yes, the BADC datanode doesn't have this at the moment for CMIP3 data
> because it would be timeconsuming to change after the fact.  However,
> our UKMO test runs are putting the version directory where you say.  
>
> I guess this only answers part of your questions but I hope it helps.
>
> S.
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
>
> -----Original Message-----
> From: Martina Stockhause [mailto:martina.stockhause at zmaw.de] 
> Sent: 24 August 2010 10:54
> To: Pascoe, Stephen (STFC,RAL,SSTD)
> Cc: estanislao.gonzalez at zmaw.de; drach at llnl.gov; go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] DRS structure
>
> Hi, Stephen,
>
> I'd like to stay with the TDS XML if I can, because there are a lot of
> open issues in the QC workflow. Or you convince me that I find more
> suitable information in the postgres db.
>
> Example: I get a QC result for the atomic dataset in the directory
> CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr 
>
> How do I find the TDS ID for it?
> ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggregation"
> urlPath="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggrega
> tion"
> Is this the structure, how it will remain? Then I can cut the last two.
>
> And all information of the datasets belonging to the atomic dataset?
> ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v1.pr_Amon_ECHAM6
> -MPIOM-LR_rcp45_r1_195501-199412.nc"
> urlPath="atmos/CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr/r1/
> pr_Amon_ECHAM6-MPIOM-LR_rcp45_r1_195501-199412.nc"
> There I can identify the datasets belonging to this atomic dataset using
> the urlPath.
>
> Will the directory structure change to move the version behind the realm
> as well? In my example:
> CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/v1/pr
>
> Thanks a lot in advance to clearify that.
> Best wishes,
> Martina
>
>
> stephen.pascoe at stfc.ac.uk wrote:
>   
>> Hi Martina,
>>
>> For efficiency reasons we need to publish multiple variables as one
>>     
> dataset, therefore the dataset_id won't contain a variable identifier.
> However, the variable names are contained in the THREDDS XML inside
> metadata/variable tags so they are still available.
>   
>> I would recommend you don't rely on the syntax of dataset_ids.  Either
>>     
> get the DRS attributes from THREDDS <property> elements or bypass the
> XML entirely and inspect the publisher database.  The THREDDS properties
> are tied directly to the DRS attributes CMOR creates so they will be
> much less likely to be wrong due to missconfiguration.
>   
>> I know the DRS document is out of date but the syntax should be stable
>>     
> -- We'll sort out the confusion Estani has just pointed out ASAP.
>   
>> Cheers,
>> Stephen.
>>
>> -----Original Message-----
>> From: Martina Stockhause [mailto:martina.stockhause at zmaw.de]
>> Sent: Tue 8/24/2010 7:33 AM
>> To: Estanislao Gonzalez
>> Cc: Bob Drach; Pascoe, Stephen (STFC,RAL,SSTD); go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] DRS structure
>>  
>> Dear all,
>>
>> we really need to fix the DRS structure and the reflectance of the DRS
>>     
>
>   
>> syntax in the TDS catalogue.
>>
>> During the QC, which runs in the file system with DRS syntax, I need 
>> to have a connection to the TDS to check the consistency of data 
>> against metadata after the automated checks. Since I don't want to 
>> touch each dataset again, I take the TDS metadata as reference for
>>     
> data content.
>   
>> Up to now it was possible to take the dataset_id as DRS name out of 
>> the TDS in the atomic dataset (TDS aggregation = QC result level) and 
>> the netcdf file level.
>>
>> Now the <variable> part of the DRS is missing in the dataset_id of the
>>     
>
>   
>> netcdf file, so that I am about to take the urlPath instead.
>> Is that ok?
>>
>> Why can't we use the DRS syntax as IDs in the TDS and in metafor? That
>>     
>
>   
>> would make things much easier.
>>
>> The DRS syntax is my connection from the QC checked files to the TDS 
>> and to metafor. Therefore the DRS syntax should be fixed soon and 
>> documented in the DRS document. So, that we can start to adapt our 
>> examples and scripts.
>>
>> Best wishes,
>> Martina
>>
>>
>> Estanislao Gonzalez wrote:
>>   
>>     
>>> Hi all,
>>>
>>> I've realized we've been moving things from one place to another 
>>> regarding the DRS components, and the DRS Reference Syntax document 
>>> (from 7/4/2010) does not reflect this changes.
>>>
>>> There are two major difference here:
>>> 1) versioning: the drslib tool is creating a structure which is, for 
>>> the time being, not drs conform. I totally agree with the new 
>>> version-component placement, but should that not be reflected in the 
>>> DRS syntax document?
>>> 2) in CMIP5 Best Practices for Data Publication stays that the 
>>> dataset_id should be:
>>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<re
>>> alm>.<ensemble> I know the dataset_id is not required to necessary 
>>> match any drs structure. But I personally think we should avoid 
>>> drs-similar identifiers, as IMHO it increases confusion.
>>> I think this solution helps solving some publishing problems, but 
>>> defines a new dataset level, the "ensemble dataset". And the 
>>> realm-dataset is not being used anywhere else (or am I missing 
>>> something?)
>>>
>>> I'm not aware of the reasons behind the definition of the DRS 
>>> structure as it currently is. But I think, we should avoid drifting 
>>> away from that document. In any case the document should be updated
>>>       
> first.
>   
>>> If I try to join all changes and proposals I've heard of, AFAIC the 
>>> DRS structure we are going to appears to look something like:
>>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<re
>>> alm>.<ensemble>.<version>.<variable>
>>>
>>> Which is different from the original:
>>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<re
>>> alm>.<variable>.<ensemble>.<version>
>>>
>>> Can anyone with more knowledge on the subject comment on this?
>>>
>>> Thanks,
>>> Estani
>>>
>>>   
>>>     
>>>       
>>   
>>     
>
> --
> ----------- DKRZ / Data Management -----------
>
> Martina Stockhause
> Deutsches Klimarechenzentrum
> Bundesstr. 45a
> D-20146 Hamburg
> Germany
>
> phone:	+49-40-460094-122
> FAX:	+49-40-460094-106
> e-mail:	martina.stockhause at zmaw.de
>
> ----------------------------------------------
>
>   

-- 
----------- DKRZ / Data Management -----------

Martina Stockhause
Deutsches Klimarechenzentrum
Bundesstr. 45a
D-20146 Hamburg
Germany

phone:	+49-40-460094-122
FAX:	+49-40-460094-106
e-mail:	martina.stockhause at zmaw.de

----------------------------------------------



More information about the GO-ESSP-TECH mailing list