[Go-essp-tech] DRS structure

Martina Stockhause martina.stockhause at zmaw.de
Tue Aug 24 03:53:31 MDT 2010


Hi, Stephen,

I'd like to stay with the TDS XML if I can, because there are a lot of
open issues in the QC workflow. Or you convince me that I find more
suitable information in the postgres db.

Example: I get a QC result for the atomic dataset in the directory
CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr 

How do I find the TDS ID for it?
ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggregation"
urlPath="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.pr.1.aggregation"
Is this the structure, how it will remain? Then I can cut the last two.

And all information of the datasets belonging to the atomic dataset?
ID="CMIP5.output.MPI-M.ECHAM6-MPIOM-LR.rcp45.mon.atmos.v1.pr_Amon_ECHAM6-MPIOM-LR_rcp45_r1_195501-199412.nc"
urlPath="atmos/CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/pr/r1/pr_Amon_ECHAM6-MPIOM-LR_rcp45_r1_195501-199412.nc"
There I can identify the datasets belonging to this atomic dataset using
the urlPath.

Will the directory structure change to move the version behind the realm
as well? In my example:
CMIP5/output/MPI-M/ECHAM6-MPIOM-LR/rcp45/mon/atmos/v1/pr

Thanks a lot in advance to clearify that.
Best wishes,
Martina


stephen.pascoe at stfc.ac.uk wrote:
> Hi Martina,
>
> For efficiency reasons we need to publish multiple variables as one dataset, therefore the dataset_id won't contain a variable identifier.  However, the variable names are contained in the THREDDS XML inside metadata/variable tags so they are still available.
>
> I would recommend you don't rely on the syntax of dataset_ids.  Either get the DRS attributes from THREDDS <property> elements or bypass the XML entirely and inspect the publisher database.  The THREDDS properties are tied directly to the DRS attributes CMOR creates so they will be much less likely to be wrong due to missconfiguration.
>
> I know the DRS document is out of date but the syntax should be stable -- We'll sort out the confusion Estani has just pointed out ASAP.
>
> Cheers,
> Stephen.
>
> -----Original Message-----
> From: Martina Stockhause [mailto:martina.stockhause at zmaw.de]
> Sent: Tue 8/24/2010 7:33 AM
> To: Estanislao Gonzalez
> Cc: Bob Drach; Pascoe, Stephen (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] DRS structure
>  
> Dear all,
>
> we really need to fix the DRS structure and the reflectance of the DRS
> syntax in the TDS catalogue.
>
> During the QC, which runs in the file system with DRS syntax, I need to
> have a connection to the TDS to check the consistency of data against
> metadata after the automated checks. Since I don't want to touch each
> dataset again, I take the TDS metadata as reference for data content.
>
> Up to now it was possible to take the dataset_id as DRS name out of the
> TDS in the atomic dataset (TDS aggregation = QC result level) and the
> netcdf file level.
>
> Now the <variable> part of the DRS is missing in the dataset_id of the
> netcdf file, so that I am about to take the urlPath instead.
> Is that ok?
>
> Why can't we use the DRS syntax as IDs in the TDS and in metafor? That
> would make things much easier.
>
> The DRS syntax is my connection from the QC checked files to the TDS and
> to metafor. Therefore the DRS syntax should be fixed soon and documented
> in the DRS document. So, that we can start to adapt our examples and
> scripts.
>
> Best wishes,
> Martina
>
>
> Estanislao Gonzalez wrote:
>   
>> Hi all,
>>
>> I've realized we've been moving things from one place to another 
>> regarding the DRS components, and the DRS Reference Syntax document 
>> (from 7/4/2010) does not reflect this changes.
>>
>> There are two major difference here:
>> 1) versioning: the drslib tool is creating a structure which is, for the 
>> time being, not drs conform. I totally agree with the new 
>> version-component placement, but should that not be reflected in the DRS 
>> syntax document?
>> 2) in CMIP5 Best Practices for Data Publication stays that the 
>> dataset_id should be: 
>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<realm>.<ensemble>
>> I know the dataset_id is not required to necessary match any drs 
>> structure. But I personally think we should avoid drs-similar 
>> identifiers, as IMHO it increases confusion.
>> I think this solution helps solving some publishing problems, but 
>> defines a new dataset level, the "ensemble dataset". And the 
>> realm-dataset is not being used anywhere else (or am I missing something?)
>>
>> I'm not aware of the reasons behind the definition of the DRS structure 
>> as it currently is. But I think, we should avoid drifting away from that 
>> document. In any case the document should be updated first.
>>
>> If I try to join all changes and proposals I've heard of, AFAIC the DRS 
>> structure we are going to appears to look something like:
>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<realm>.<ensemble>.<version>.<variable>
>>
>> Which is different from the original:
>> cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<realm>.<variable>.<ensemble>.<version>
>>
>> Can anyone with more knowledge on the subject comment on this?
>>
>> Thanks,
>> Estani
>>
>>   
>>     
>
>   

-- 
----------- DKRZ / Data Management -----------

Martina Stockhause
Deutsches Klimarechenzentrum
Bundesstr. 45a
D-20146 Hamburg
Germany

phone:	+49-40-460094-122
FAX:	+49-40-460094-106
e-mail:	martina.stockhause at zmaw.de

----------------------------------------------



More information about the GO-ESSP-TECH mailing list