[Go-essp-tech] [is-enes-sa2-jra4] Example of configuring a datanode to serve CMIP3-DRS

Estanislao Gonzalez estanislao.gonzalez at zmaw.de
Mon Jul 5 01:30:33 MDT 2010


Hi Bob,

I guess you must be on vacations now. Anyway, here's the question, maybe
someone else can answer it:

The very first idea I had was almost what you proposed. Your proposal
though leaves URLs of the form:
http://myserver/thredds/fileserver/CMIP5_replicas/output/...
                                                             <---
(almost) DRS Structure ----------->

Which has no valid DRS structure (CMIP5_replicas nor CMIP5_core are in
the DRS vocabulary).

My proposal has a very similar flaw:
http://myserver/thredds/fileserver/replicated/CMIP5/output/...
                                                                               
<--- full DRS Structure ----------->
The DRS structure is preserved, but you cannot easily infer the correct
URL from any dataset. I think the Idea is: if you know the prefix
(http.../fileserver/) and the dataset DRS name you can always get the
file without even browising the TDS:
prefix + DRS = URL to file

AFAIK the URL structure used by the TDS will never be 100% DRS conform
(according to the DRS version 0.27)
This one has the form:
http://<hostname>/<activity>/<product>/<institute>/<model>/<experiment>/<frequency>/<modeling
realm>/<variable identifier>/<ensemble member>/<version>/ [<endpoint>],

where the TDS one has the endpoint moved to the front (the
thredds/fileserver, thredds/dodsC, etc parts).

To sum things up:
Is it possible to publish files from different directory structures into
an unified URL structure so that it is completely transparent to the user?
Am I the only one addressing this problem? Are all other institutions
planning  to publish all files from only one directory?

The only viable solution I can think of is to rely on Stephen's
versioning concept and maintaining a single true DRS structure with
links to files kept in other more manageable directory structures (This
will probably involve adapting Stephen's tool).

Thanks,
Estani


Bob Drach wrote:
> Hi Estani,
>
> It should be possible to do what you want without running multiple
> data nodes.
>
> The purpose of the THREDDS dataset roots is to hide the directory
> structure from the end user, and to limit what the TDS can access. But
> THREDDS can certainly have multiple dataset roots.
>
> In your example below, you should associate different paths with the
> locations, for example:
>
>> <datasetRoot path="CMIP5_replicas" location="/replicated/CMIP5"/>
>> <datasetRoot path="CMIP5_core" location="/core/CMIP5"/>
>
>
> Also be aware that in the publisher configuration:
>
> - the directory_format can have multiple values, separated by vertical
> bars (|). The publisher will use the first format that matches the
> directory structure being scanned.
>
> - a useful strategy is to create different project sections for
> various groups of directives. You could define a cmip5_replica
> project, a cmip5_core project, etc.
>
> Bob
>
> On Jul 1, 2010, at 5:42 AM, Estanislao Gonzalez wrote:
>
>> Hi Bryan,
>>
>> thanks for your answer!
>> Running multiple ESG data nodes is always a possibility, but it seems an
>> overkill to us as we may have several different "data repositories".
>> We would like to separate: core-replicated, core-non-replicated,
>> non-core, non-core-on-hpss, as well as other non-cmip5 data. Having 5+
>> ESG data nodes is not viable in our scenario.
>>
>> The TDS allows the separation of access URL from the underlying file
>> structure so that it might be possible. AFAIK the publisher does not
>> provide a simple way of doing this.
>>
>> Setting thredds_dataset_roots to different values while publishing
>> doesn't appear to work as those are mapped to a map-entry at the
>> catalog root:
>> <datasetRoot path="CMIP5" location="/replicated/CMIP5"/>
>> <datasetRoot path="CMIP5" location="/core/CMIP5"/>
>> ..
>>
>> which is clearly non bijective and can't therefore be reversed to
>> locate the file from a given URL.
>>
>> While publishing all referred data will be held on a known location.
>> Is it possible to use somehow this information to setup a proper
>> catalog configuration so that the URL can be properly mapped? At
>> least on a dataset level?
>>
>> The whole HPSS staging procedure should be completely transparent to
>> the user, as well as the location of the files. I was just looking at
>> other options in case we cannot publish them the way we want...
>>
>> Cheers,
>> Estani
>>
>>
>>
>>
>> Bryan Lawrence wrote:
>>> sorry.
>>>
>>> the first sentence should have read
>>>
>>> Just to note that *our* approach to the local versus replication issue
>>> will be ...
>>>
>>> Cheers
>>> Bryan
>>>
>>> On Thursday 01 Jul 2010 11:25:37 Bryan Lawrence wrote:
>>>
>>>> Hi Estani
>>>>
>>>> Just to note that your approach to the local versus replication will
>>>> be to run two different ESG nodes ... which is in fact the desired
>>>> outcome so as to get the right things in the catalogues at the right
>>>> time (vis- a-viz qc etc).
>>>>
>>>> The issue with respect to cache, I'm not so sure about, in what way
>>>> do you want to expose that into ESG?
>>>>
>>>> Bryan
>>>>
>>>> On Wednesday 30 Jun 2010 17:05:57 Estanislao Gonzalez wrote:
>>>>
>>>>> Hi Stephen,
>>>>>
>>>>> the page contains really helpful information, thanks a lot!
>>>>>
>>>>> I'm also interested in some variables of the DEFAULT section from
>>>>> the esg.ini configuration file. More specifically:
>>>>> thredds_dataset_roots (and maybe thredds_aggregation_services or
>>>>> any other which was changed or you think it might be important)
>>>>>
>>>>> The main question here is: how can different local directory
>>>>> structures be published to the same DRS structure?
>>>>> The example scenario in our case will be:
>>>>> /replicated/<DRS structure> - for replicated data
>>>>> /local/<DRS structure> - for non replicated data hold on disk
>>>>> /cache/<DRS structure> - for data staged from a HPSS system
>>>>>
>>>>> The only solution I can think of is to extend the URL before the
>>>>> DRS structure starts (the URL won't be 100% DRS conform anyway). So
>>>>>    http://*server/thredds/fileserver/<DRS structure>
>>>>> will turn into
>>>>>    http://*server/thredds/fileserver/replicated/<DRS structure>
>>>>>    http://*server/thredds/fileserver/local/<DRS structure>
>>>>>    http://*server/thredds/fileserver/cache/<DRS structure>
>>>>>
>>>>> Is that viable? Are there any other options?
>>>>>
>>>>> Thanks,
>>>>> Estani
>>>>>
>>>>> stephen.pascoe at stfc.ac.uk wrote:
>>>>>
>>>>>> To illustrate how the ESG datanode can be configured to serve
>>>>>> data for CMIP5 we have deployed a datanode containing a subset of
>>>>>> CMIP3 in the Data Reference Syntax. Some key features of this
>>>>>> deployment are:
>>>>>>
>>>>>>    * The underlying directory structure is based on the Data
>>>>>>      Reference Syntax.
>>>>>>    * Datasets published at the realm level.
>>>>>>    * The token-based security filter is replaced by the
>>>>>>      OpenidRelyingParty security filter.
>>>>>>
>>>>>> Further notes can be found at
>>>>>> http://*proj.badc.rl.ac.uk/go-essp/wiki/CMIP3_Datanode
>>>>>>
>>>>>> This test deployment should be of interest to anyone wanting to
>>>>>> know how DRS identifiers could be exposed in THREDDS catalogues
>>>>>> and the TDS HTML interface.  You can also try downloading files
>>>>>> with OpenID authentication or via wget with SSL-client
>>>>>> certificate authentication.  See the link above for details.
>>>>>>
>>>>>> Cheers,
>>>>>> Stephen.
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Stephen Pascoe  +44 (0)1235 445980
>>>>>> British Atmospheric Data Centre
>>>>>> Rutherford Appleton Laboratory
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------
>>>>>> -- -----
>>>>>>
>>>>>> _______________________________________________
>>>>>> GO-ESSP-TECH mailing list
>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>
>>>
>>>
>>
>>
>> -- 
>> Estanislao Gonzalez
>>
>> Max-Planck-Institut für Meteorologie (MPI-M)
>> Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>> Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>
>> Phone:   +49 (40) 46 00 94-126
>> E-Mail:  estanislao.gonzalez at zmaw.de
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  estanislao.gonzalez at zmaw.de



More information about the GO-ESSP-TECH mailing list