[Go-essp-tech] DRS syntax into ESG

Thu Nov 5 14:01:39 MST 2009

OK, I'll discuss it with Dean and Karl, and come up with some ideas.  
Thanks,

Bob

On Nov 5, 2009, at 11:59 AM, Luca Cinquini wrote:

> Hi Bob,
> 	I think we can do pretty much anything, we just need to be clear  
> on what the requirements are. I agree that 9 clicks might be too  
> much, but maybe 3 or 4 can be a good compromise between speed and  
> overwhelming results. A matrix is possible too, for example see here:
>
> http://*esg.ucar.edu/browse/viewProject.htm? 
> projectId=ff3949c8-2008-45c8-8e27-5834f54be50f
>
> (where now all folders are eventually empty).
>
> Maybe, since this is mostly a CMIP5 presentation issue, you guys at  
> PCMDI can decide on what kind of browsing/clicking you would like  
> the users to go through, and let us know ?
>
> thanks, Luca
>
>
> On Nov 5, 2009, at 12:53 PM, Bob Drach wrote:
>
>> Hi Luca,
>>
>> On Nov 5, 2009, at 10:57 AM, Luca Cinquini wrote:
>>
>>> Hi Bob,
>>> 	how you build the dataset hierarchy really boils down on how you  
>>> want users to browse. I was under the impression that you wanted  
>>> users to browse the catalogs reflecting how the data was stored  
>>> on disk, but maybe I was wrong.
>>
>> The browsing should be organized for user convenience - as few  
>> clicks as necessary. If the browsing hierarchy is decoupled from  
>> the organization on disk, then the disk hierarchy can be arranged  
>> for convenience of publication as well. This is particularly  
>> useful for publication of legacy data, where you don't necessarily  
>> have control over the disk organization. So no, I don't think the  
>> gateway browsing hierarchy should necessarily mirror the disk  
>> organization at the data node.
>>
>>
>>> You don't think it would be too confusing to have all datasets  
>>> for a single model/experiment/frequency/realm/variable/ensemble  
>>> be contained in the very same HTML page ?
>>
>> Well yes, I do think that might be confusing. But it would be  
>> worse to have to click through nine level of hierarchy to find a  
>> dataset. Isn't there some intermediate representation that  
>> balances the depth of hierarchy with information per page?
>>
>> For example, the hierarchy might be presented as a table of model  
>> vs. experiment, with each table cell containing links to datasets  
>> (or at least to a shallower hierarchy). Would that be difficult to  
>> do?
>>
>> Thanks,
>>
>> Bob
>>
>>
>>> I think for searching we all agree that what needs to be done is  
>>> simply harvest all the fields in the database/triple store and  
>>> then expose the corresponding facets.
>>> thanks, Luca
>>>
>>> On Nov 5, 2009, at 11:28 AM, Bob Drach wrote:
>>>
>>>> Hi Luca,
>>>>
>>>> Thanks for raising the issue - I've been wondering about this too.
>>>>
>>>> The hierarchy of datasets as presented by the gateway - for  
>>>> users to browse through - shouldn't necessarily be the same as  
>>>> the hierarchy introduced by DRS. Users should be able to find  
>>>> datasets with as few clicks as possible, which is why we just  
>>>> went through the exercise of 'flattening' the THREDDS catalogs.
>>>>
>>>> The publisher already associates properties corresponding to the  
>>>> DRS fields (model, experiment, etc.) into the catalogs, with the  
>>>> exception of version numbers (which are coming in the next  
>>>> release). So here's a way forward:
>>>>
>>>> - The publisher is configured such that the categories defined  
>>>> for the IPCC5/CMIP5 project (activity) include the DRS fields.  
>>>> As I said, this is already mostly true. The categories are  
>>>> mandatory - must be resolved before publication.
>>>> - Each catalog corresponding to a dataset has properties that  
>>>> define these values. On publication the gateway ingests these  
>>>> values in searchable fashion.
>>>> - When the portal receives a DRS request, it parses the URL,  
>>>> searches on the resulting fields, and resolves to the  
>>>> corresponding dataset.
>>>>
>>>> The main point is that this can be independent of the dataset  
>>>> hierarchy as generated during publication.
>>>>
>>>> Bob
>>>>
>>>> On Nov 5, 2009, at 4:50 AM, Luca Cinquini wrote:
>>>>
>>>>> Hi,
>>>>> 	the purpose of this email is to start a conversation, and a  
>>>>> plan of
>>>>> action, on how to incorporate the DRS syntax into the ESG system.
>>>>> As a reminder, the current DRS specification states that a CMIP5
>>>>> dataset will be uniquely identified by the following URL:
>>>>>
>>>>> http://***<hostname>/<activity>/<institute>/<model>/<experiment>/
>>>>> <frequency>/<modeling realm>/<variable>/<ensemble member>/ 
>>>>> <version>/
>>>>> [<endpoint>]
>>>>>
>>>>> where most of the <...> fields are controlled vocabularies
>>>>> for example:
>>>>>
>>>>> http://***badc.nerc.ac.uk/activity/institute/model/experiment/ 
>>>>> frequency/realm/varname/r
>>>>> 1/v1/
>>>>>
>>>>> The first question would be what does it mean to capture the  
>>>>> semantics
>>>>> of the DRS syntax within ESG ? I can see at least two answers:
>>>>>
>>>>> a) The user is able to browse the CMIP5 datasets hierarchically
>>>>> according to the DRS hierarchy of fields
>>>>> b) The user is able to search for data based on facets that  
>>>>> reflect
>>>>> the DRS syntax: activity, institute, experiment, etc..
>>>>>
>>>>> So how do we get there ? A straw-man workflow could be the  
>>>>> following:
>>>>>
>>>>> o) The ESG Data Node publishing client, when building the THREDDS
>>>>> catalogs, creates a hierarchy of datasets that reflects the  
>>>>> syntax.
>>>>> There is probably also a need to mark up these catalogs as  
>>>>> "DRS" or
>>>>> "CMIP5".
>>>>> o) The ESG Gateway, when parsing these catalogs, invokes a  
>>>>> specific
>>>>> handler that creates the same datasets hierarchy (this is actually
>>>>> automatic, I believe), and additionally associates corresponding
>>>>> objects at each level of the hierarchy. For example, at first  
>>>>> level
>>>>> the dataset will be associated with an activity, at second  
>>>>> level with
>>>>> an institute, and so on. An alternative way would be to  
>>>>> associate all
>>>>> the objects only to the leaf level dataset.
>>>>> o) When the metadata for the leaf nodes datasets is harvested  
>>>>> into RDF
>>>>> triples for searching, the dataset - object associations must be
>>>>> transfered to the triple store
>>>>> o) Specific CMIP5 facets can be configured to search by DRS fields
>>>>> (perhaps only on the PCMDI Gateway, or perhaps on all gateways).
>>>>>
>>>>> As mentioned, this is just a start. I do believe though that  
>>>>> this is
>>>>> an extremely important issue that must be tackled as soon as  
>>>>> possible.
>>>>>
>>>>> thanks, Luca
>>>>>
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://***mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>
>>>>
>>>
>>>
>>
>
>