[Go-essp-tech] DRS syntax into ESG

Bob Drach drach at llnl.gov
Thu Nov 5 11:28:01 MST 2009


Hi Luca,

Thanks for raising the issue - I've been wondering about this too.

The hierarchy of datasets as presented by the gateway - for users to  
browse through - shouldn't necessarily be the same as the hierarchy  
introduced by DRS. Users should be able to find datasets with as few  
clicks as possible, which is why we just went through the exercise of  
'flattening' the THREDDS catalogs.

The publisher already associates properties corresponding to the DRS  
fields (model, experiment, etc.) into the catalogs, with the  
exception of version numbers (which are coming in the next release).  
So here's a way forward:

- The publisher is configured such that the categories defined for  
the IPCC5/CMIP5 project (activity) include the DRS fields. As I said,  
this is already mostly true. The categories are mandatory - must be  
resolved before publication.
- Each catalog corresponding to a dataset has properties that define  
these values. On publication the gateway ingests these values in  
searchable fashion.
- When the portal receives a DRS request, it parses the URL, searches  
on the resulting fields, and resolves to the corresponding dataset.

The main point is that this can be independent of the dataset  
hierarchy as generated during publication.

Bob

On Nov 5, 2009, at 4:50 AM, Luca Cinquini wrote:

> Hi,
> 	the purpose of this email is to start a conversation, and a plan of
> action, on how to incorporate the DRS syntax into the ESG system.
> As a reminder, the current DRS specification states that a CMIP5
> dataset will be uniquely identified by the following URL:
>
> http://*<hostname>/<activity>/<institute>/<model>/<experiment>/
> <frequency>/<modeling realm>/<variable>/<ensemble member>/<version>/
> [<endpoint>]
>
> where most of the <...> fields are controlled vocabularies
> for example:
>
> http://*badc.nerc.ac.uk/activity/institute/model/experiment/ 
> frequency/realm/varname/r
>   1/v1/
>
> The first question would be what does it mean to capture the semantics
> of the DRS syntax within ESG ? I can see at least two answers:
>
> a) The user is able to browse the CMIP5 datasets hierarchically
> according to the DRS hierarchy of fields
> b) The user is able to search for data based on facets that reflect
> the DRS syntax: activity, institute, experiment, etc..
>
> So how do we get there ? A straw-man workflow could be the following:
>
> o) The ESG Data Node publishing client, when building the THREDDS
> catalogs, creates a hierarchy of datasets that reflects the syntax.
> There is probably also a need to mark up these catalogs as "DRS" or
> "CMIP5".
> o) The ESG Gateway, when parsing these catalogs, invokes a specific
> handler that creates the same datasets hierarchy (this is actually
> automatic, I believe), and additionally associates corresponding
> objects at each level of the hierarchy. For example, at first level
> the dataset will be associated with an activity, at second level with
> an institute, and so on. An alternative way would be to associate all
> the objects only to the leaf level dataset.
> o) When the metadata for the leaf nodes datasets is harvested into RDF
> triples for searching, the dataset - object associations must be
> transfered to the triple store
> o) Specific CMIP5 facets can be configured to search by DRS fields
> (perhaps only on the PCMDI Gateway, or perhaps on all gateways).
>
> As mentioned, this is just a start. I do believe though that this is
> an extremely important issue that must be tackled as soon as possible.
>
> thanks, Luca
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>



More information about the GO-ESSP-TECH mailing list