[Go-essp-tech] DRS structure

Tue Aug 24 04:58:53 MDT 2010

Hi Martina,

Which TDS server are you working with?  Is it one at DKRZ?  Everything
below is based on what we've been doing at BADC with the CMIP3 dataset
and MOHC's CMIP5 test data.

> Example: I get a QC result for the atomic dataset in the directory
CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr 
> 
> How do I find the TDS ID for it?
>
ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggregation"
>
urlPath="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggrega
tion"
> Is this the structure, how it will remain? Then I can cut the last
two.

The directory should be part of the dataset with
dataset_id="cmip5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos".  There
will be 1 or more versions of that dataset with THREDDS catalogue names 

  cmip5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v1.xml
  cmip5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v2.xml
  etc.

Within each catalogue there is a dataset element for the realm-dataset
containing a dataset element for each file.
I'm not sure you can use the aggregation datasets to represent
atomic-datasets.  To be honest I haven't looked at them in detail.

> Will the directory structure change to move the version behind the
realm as well? In my example:
> CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/v1/pr

Yes, the BADC datanode doesn't have this at the moment for CMIP3 data
because it would be timeconsuming to change after the fact.  However,
our UKMO test runs are putting the version directory where you say.  

I guess this only answers part of your questions but I hope it helps.

S.

---
Stephen Pascoe  +44 (0)1235 445980
British Atmospheric Data Centre
Rutherford Appleton Laboratory

-----Original Message-----
From: Martina Stockhause [mailto:martina.stockhause at zmaw.de] 
Sent: 24 August 2010 10:54
To: Pascoe, Stephen (STFC,RAL,SSTD)
Cc: estanislao.gonzalez at zmaw.de; drach at llnl.gov; go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] DRS structure

Hi, Stephen,

I'd like to stay with the TDS XML if I can, because there are a lot of
open issues in the QC workflow. Or you convince me that I find more
suitable information in the postgres db.

Example: I get a QC result for the atomic dataset in the directory
CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr 

How do I find the TDS ID for it?
ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggregation"
urlPath="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggrega
tion"
Is this the structure, how it will remain? Then I can cut the last two.

And all information of the datasets belonging to the atomic dataset?
ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v1.pr_Amon_ECHAM6
-MPIOM-LR_rcp45_r1_195501-199412.nc"
urlPath="atmos/CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr/r1/
pr_Amon_ECHAM6-MPIOM-LR_rcp45_r1_195501-199412.nc"
There I can identify the datasets belonging to this atomic dataset using
the urlPath.

Will the directory structure change to move the version behind the realm
as well? In my example:
CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/v1/pr

Thanks a lot in advance to clearify that.
Best wishes,
Martina

stephen.pascoe at stfc.ac.uk wrote:
> Hi Martina,
>
> For efficiency reasons we need to publish multiple variables as one
dataset, therefore the dataset_id won't contain a variable identifier.
However, the variable names are contained in the THREDDS XML inside
metadata/variable tags so they are still available.
>
> I would recommend you don't rely on the syntax of dataset_ids.  Either
get the DRS attributes from THREDDS <property> elements or bypass the
XML entirely and inspect the publisher database.  The THREDDS properties
are tied directly to the DRS attributes CMOR creates so they will be
much less likely to be wrong due to missconfiguration.
>
> I know the DRS document is out of date but the syntax should be stable
-- We'll sort out the confusion Estani has just pointed out ASAP.
>
> Cheers,
> Stephen.
>
> -----Original Message-----
> From: Martina Stockhause [mailto:martina.stockhause at zmaw.de]
> Sent: Tue 8/24/2010 7:33 AM
> To: Estanislao Gonzalez
> Cc: Bob Drach; Pascoe, Stephen (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] DRS structure
>  
> Dear all,
>
> we really need to fix the DRS structure and the reflectance of the DRS

> syntax in the TDS catalogue.
>
> During the QC, which runs in the file system with DRS syntax, I need 
> to have a connection to the TDS to check the consistency of data 
> against metadata after the automated checks. Since I don't want to 
> touch each dataset again, I take the TDS metadata as reference for
data content.
>
> Up to now it was possible to take the dataset_id as DRS name out of 
> the TDS in the atomic dataset (TDS aggregation = QC result level) and 
> the netcdf file level.
>
> Now the <variable> part of the DRS is missing in the dataset_id of the

> netcdf file, so that I am about to take the urlPath instead.
> Is that ok?
>
> Why can't we use the DRS syntax as IDs in the TDS and in metafor? That

> would make things much easier.
>
> The DRS syntax is my connection from the QC checked files to the TDS 
> and to metafor. Therefore the DRS syntax should be fixed soon and 
> documented in the DRS document. So, that we can start to adapt our 
> examples and scripts.
>
> Best wishes,
> Martina
>
>
> Estanislao Gonzalez wrote:
>   
>> Hi all,
>>
>> I've realized we've been moving things from one place to another 
>> regarding the DRS components, and the DRS Reference Syntax document 
>> (from 7/4/2010) does not reflect this changes.
>>
>> There are two major difference here:
>> 1) versioning: the drslib tool is creating a structure which is, for 
>> the time being, not drs conform. I totally agree with the new 
>> version-component placement, but should that not be reflected in the 
>> DRS syntax document?
>> 2) in CMIP5 Best Practices for Data Publication stays that the 
>> dataset_id should be:
>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<re
>> alm>.<ensemble> I know the dataset_id is not required to necessary 
>> match any drs structure. But I personally think we should avoid 
>> drs-similar identifiers, as IMHO it increases confusion.
>> I think this solution helps solving some publishing problems, but 
>> defines a new dataset level, the "ensemble dataset". And the 
>> realm-dataset is not being used anywhere else (or am I missing 
>> something?)
>>
>> I'm not aware of the reasons behind the definition of the DRS 
>> structure as it currently is. But I think, we should avoid drifting 
>> away from that document. In any case the document should be updated
first.
>>
>> If I try to join all changes and proposals I've heard of, AFAIC the 
>> DRS structure we are going to appears to look something like:
>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<re
>> alm>.<ensemble>.<version>.<variable>
>>
>> Which is different from the original:
>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<re
>> alm>.<variable>.<ensemble>.<version>
>>
>> Can anyone with more knowledge on the subject comment on this?
>>
>> Thanks,
>> Estani
>>
>>   
>>     
>
>   

--
----------- DKRZ / Data Management -----------

Martina Stockhause
Deutsches Klimarechenzentrum
Bundesstr. 45a
D-20146 Hamburg
Germany

phone:	+49-40-460094-122
FAX:	+49-40-460094-106
e-mail:	martina.stockhause at zmaw.de

----------------------------------------------

-- 
Scanned by iCritical.