[Go-essp-tech] DRS, version number & Co

Estanislao Gonzalez gonzalez at dkrz.de
Fri Jul 8 04:48:31 MDT 2011


Hi Sebastien,

The only part which was not resolved (or agreed upon) was the first 
level after /fileServer/ for the file access capabilities of the TDS. so 
every node has something different (we will be publishing to 
/fileServer/cmip/out... though)

what the drslib do is threefold:
1) It separates output into output1 and output2 since only output2 is 
interesting for replication (that's how I got IPSL aqua4K experiment, 
just skipped everything that was output2..)
2) It version the files (and thus inserts the version into the DRS 
structure). This helps finding the version and is the only way I know 
of, that it can be gathered from the Gateway.
3) it recreates the DRS structure assuring is a valid one. For reasons 
I'm not aware of, it misses the activity part so you can still end up 
with a non valid DRS... in CNRM case it means it will not validate the 
CMIP5 which should have been cmip5 (sadly computers are worse than the 
worst bureaucrats :-)

The drslib it's quite well described (In my opinion) and it's here:  
http://esgf.org/esgf-drslib-site/
All documentation regarding the datanode and all tools around it can be 
found here: http://esgf.org/wiki/ESGF_Node

Hope this helps,
Estani

Am 08.07.2011 12:13, schrieb Stéphane Senesi:
> Hi all,
>
> martin.juckes at stfc.ac.uk wrote, On 08/07/2011 11:09:
>> Hi Estani,
>>
>> I agree with you that this is an important issue and that we want to have a clean implementation.
>>
>> Unfortunately, given where we are now, I don't think there is going to be any support for withdrawing data nodes which don't meet this implementation standard -- so enforcement by the gateway won't work. So I think the only way forward is to work on simplifying the installation and then persuade the node managers to adopt the standard. Making it the default would, as you suggest, be a huge help.
>>
>> I keep telling our users in the UK that the archive is currently in a very early stage, with a significant chance that data will be replaced. The same applies to the level of service. I think we need to work on demonstrating best practise as far as data node deployment goes.
>>
>> At the moment I see the PKI security as a higher priority, since most of our users want scripting access rather than clicking through the gateways, and this only works when the PKI security is enabled.
>>
>> For the versioning implementation, it would help to have a step by step guide on esgf.org (or if it is already there, it would help me to understand the issues if I knew where it is) -- but I guess this will have to wait until Stephen has worked through some other priorities.
>>    
>
> Regarding CNRM data node, what prevented us to turn to the 
> "recommended" directory structure (it is not coined as "standard" in 
> CMIP5 documents), was the lack of such a guide
>
> I agree with Martin that it is important to ease an OpenDAP-enabled 
> scripted access for data users; if it appears that this version issue 
> is the only obstacle for computing datafiles addresses (not quoting 
> the issue of data node name), then we can consider changing the 
> directory structure (assuming we have the guide).
>
> Alss, I note that the first part of HTTPServer URL's also show a part 
> which may vary on a datanode basis,and even on an experiment or realm 
> basis (such as the boldface part in 
> http://esg.cnrm-game-meteo.fr/thredds/fileServer/*esg_dataroot1*/CMIP5/output/CNRM-CERFACS/CNRM-CM5/historicalGHG/mon/land/evspsblsoi/r1i1p1/evspsblsoi_Lmon_CNRM-CM5_historicalGHG_r1i1p1_190001-194912.nc) 
> . Would this be cured by applying drslib ?
>
> On a very close subject, may I quote Sébastien Denvil ( 9 june 2011), 
> with whom I agree :
>
>> I would like to remind us all that having a clear add/remove/update 
>> procedure is a requirement together with add/remove/update impact on 
>> versions (dataset version, file version).
>>
>> It's clear we do our best to publish the right dataset. It's clear 
>> too that QC process, and scientific process will spot issues 
>> (acceptable or not) and will trigger add/remove/update actions.
>>
>> I can't remember if a clear document describing publish/unpublish 
>> procedure exist. That should describe from both perspective (data 
>> provider/those in charge of replication) how to:
>>
>> - add file(s) within existing datasets
>> - remove file(s) from existing datasets
>> - update files(s) from existing datasets (is that just add/remove? 
>> not if we want easy life for replication. Yes if we want easy life 
>> for data provider)
>>
>> If such document doesn't exist yet I think it is a priority (given 
>> where we are) to produce one.
>>
>> Can someone points me that document?
>
> Regards
>
> Stéphane
>
>> It should be possible to get all this fixed in time, but I think people are working through a large number of issues in parallel at present.
>>
>> Cheers,
>> Martin
>>
>>    
>>>> -----Original Message-----
>>>> From:go-essp-tech-bounces at ucar.edu  [mailto:go-essp-tech-
>>>> bounces at ucar.edu] On Behalf Of Estanislao Gonzalez
>>>> Sent: 07 July 2011 16:46
>>>> To:go-essp-tech at ucar.edu
>>>> Subject: [Go-essp-tech] DRS, version number&  Co
>>>>
>>>> Hi,
>>>>
>>>> What's the current stand regarding DRS and dataset version number?
>>>>
>>>> I've seen too many data nodes with too many different configurations.
>>>> > From invalid datasets name to invalid DRS structure, names, missing
>>>> version numbers, etc.
>>>> The version number is a particular interesting one, since in some
>>>> cases
>>>> the only way to find it is by parsing the TDS Catalogs themselves,
>>>> since
>>>> the Gateway is not providing this info (AFAICT) and if the DRS is not
>>>> followed can neither be implied from the directory structure of its
>>>> files.
>>>>
>>>> Obviously neither the publisher nor the Gateway is enforcing those
>>>> constraints. I think this should be changed ASAP.
>>>> Both Node and Gateway publishing steps should enforce this when
>>>> publishing for cmip5. I think is the most direct way to get to the
>>>> publisher at the right time.
>>>>
>>>> If we keep drifting away from what we already agreed on, we won't be
>>>> able to do anything useful with the data at all, since we won't be
>>>> able
>>>> to handle it properly.
>>>>
>>>> I'll urge the data node managers to check DRS compliance.
>>>>
>>>> I've only seen BADC publishing according to the DRS structure. I know
>>>> PCMDI, BCC, CNRM and NCCS are not. I haven't checked others.
>>>>
>>>> Thanks,
>>>> Estani
>>>>
>>>> --
>>>> Estanislao Gonzalez
>>>>
>>>> Max-Planck-Institut für Meteorologie (MPI-M)
>>>> Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>>>> Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>>
>>>> Phone:   +49 (40) 46 00 94-126
>>>> E-Mail:gonzalez at dkrz.de
>>>>
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>        
>
>
> -- 
> Stéphane Sénési
> Ingénieur - équipe Assemblage du Système Terre
> Centre National de Recherches Météorologiques
> Groupe de Météorologie à Grande Echelle et Climat
>
> CNRM/GMGEC/ASTER
> 42 Av Coriolis
> F-31057 Toulouse Cedex 1
>
> +33.5.61.07.99.31 (Fax :....9610)
>
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110708/497e9479/attachment.html 


More information about the GO-ESSP-TECH mailing list