[Go-essp-tech] DRS syntax into ESG

Thu Nov 5 12:53:03 MST 2009

Hi Luca,

On Nov 5, 2009, at 10:57 AM, Luca Cinquini wrote:

> Hi Bob,
> 	how you build the dataset hierarchy really boils down on how you  
> want users to browse. I was under the impression that you wanted  
> users to browse the catalogs reflecting how the data was stored on  
> disk, but maybe I was wrong.

The browsing should be organized for user convenience - as few clicks  
as necessary. If the browsing hierarchy is decoupled from the  
organization on disk, then the disk hierarchy can be arranged for  
convenience of publication as well. This is particularly useful for  
publication of legacy data, where you don't necessarily have control  
over the disk organization. So no, I don't think the gateway browsing  
hierarchy should necessarily mirror the disk organization at the data  
node.

> You don't think it would be too confusing to have all datasets for  
> a single model/experiment/frequency/realm/variable/ensemble be  
> contained in the very same HTML page ?

Well yes, I do think that might be confusing. But it would be worse  
to have to click through nine level of hierarchy to find a dataset.  
Isn't there some intermediate representation that balances the depth  
of hierarchy with information per page?

For example, the hierarchy might be presented as a table of model vs.  
experiment, with each table cell containing links to datasets (or at  
least to a shallower hierarchy). Would that be difficult to do?

Thanks,

Bob

> I think for searching we all agree that what needs to be done is  
> simply harvest all the fields in the database/triple store and then  
> expose the corresponding facets.
> thanks, Luca
>
> On Nov 5, 2009, at 11:28 AM, Bob Drach wrote:
>
>> Hi Luca,
>>
>> Thanks for raising the issue - I've been wondering about this too.
>>
>> The hierarchy of datasets as presented by the gateway - for users  
>> to browse through - shouldn't necessarily be the same as the  
>> hierarchy introduced by DRS. Users should be able to find datasets  
>> with as few clicks as possible, which is why we just went through  
>> the exercise of 'flattening' the THREDDS catalogs.
>>
>> The publisher already associates properties corresponding to the  
>> DRS fields (model, experiment, etc.) into the catalogs, with the  
>> exception of version numbers (which are coming in the next  
>> release). So here's a way forward:
>>
>> - The publisher is configured such that the categories defined for  
>> the IPCC5/CMIP5 project (activity) include the DRS fields. As I  
>> said, this is already mostly true. The categories are mandatory -  
>> must be resolved before publication.
>> - Each catalog corresponding to a dataset has properties that  
>> define these values. On publication the gateway ingests these  
>> values in searchable fashion.
>> - When the portal receives a DRS request, it parses the URL,  
>> searches on the resulting fields, and resolves to the  
>> corresponding dataset.
>>
>> The main point is that this can be independent of the dataset  
>> hierarchy as generated during publication.
>>
>> Bob
>>
>> On Nov 5, 2009, at 4:50 AM, Luca Cinquini wrote:
>>
>>> Hi,
>>> 	the purpose of this email is to start a conversation, and a plan of
>>> action, on how to incorporate the DRS syntax into the ESG system.
>>> As a reminder, the current DRS specification states that a CMIP5
>>> dataset will be uniquely identified by the following URL:
>>>
>>> http://**<hostname>/<activity>/<institute>/<model>/<experiment>/
>>> <frequency>/<modeling realm>/<variable>/<ensemble member>/<version>/
>>> [<endpoint>]
>>>
>>> where most of the <...> fields are controlled vocabularies
>>> for example:
>>>
>>> http://**badc.nerc.ac.uk/activity/institute/model/experiment/ 
>>> frequency/realm/varname/r
>>>  1/v1/
>>>
>>> The first question would be what does it mean to capture the  
>>> semantics
>>> of the DRS syntax within ESG ? I can see at least two answers:
>>>
>>> a) The user is able to browse the CMIP5 datasets hierarchically
>>> according to the DRS hierarchy of fields
>>> b) The user is able to search for data based on facets that reflect
>>> the DRS syntax: activity, institute, experiment, etc..
>>>
>>> So how do we get there ? A straw-man workflow could be the  
>>> following:
>>>
>>> o) The ESG Data Node publishing client, when building the THREDDS
>>> catalogs, creates a hierarchy of datasets that reflects the syntax.
>>> There is probably also a need to mark up these catalogs as "DRS" or
>>> "CMIP5".
>>> o) The ESG Gateway, when parsing these catalogs, invokes a specific
>>> handler that creates the same datasets hierarchy (this is actually
>>> automatic, I believe), and additionally associates corresponding
>>> objects at each level of the hierarchy. For example, at first level
>>> the dataset will be associated with an activity, at second level  
>>> with
>>> an institute, and so on. An alternative way would be to associate  
>>> all
>>> the objects only to the leaf level dataset.
>>> o) When the metadata for the leaf nodes datasets is harvested  
>>> into RDF
>>> triples for searching, the dataset - object associations must be
>>> transfered to the triple store
>>> o) Specific CMIP5 facets can be configured to search by DRS fields
>>> (perhaps only on the PCMDI Gateway, or perhaps on all gateways).
>>>
>>> As mentioned, this is just a start. I do believe though that this is
>>> an extremely important issue that must be tackled as soon as  
>>> possible.
>>>
>>> thanks, Luca
>>>
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>
>>
>
>