[Go-essp-tech] DRS syntax into ESG

Luca Cinquini luca at ucar.edu
Thu Nov 5 12:59:29 MST 2009


Hi Bob,
	I think we can do pretty much anything, we just need to be clear on  
what the requirements are. I agree that 9 clicks might be too much,  
but maybe 3 or 4 can be a good compromise between speed and  
overwhelming results. A matrix is possible too, for example see here:

http://esg.ucar.edu/browse/viewProject.htm?projectId=ff3949c8-2008-45c8-8e27-5834f54be50f

(where now all folders are eventually empty).

Maybe, since this is mostly a CMIP5 presentation issue, you guys at  
PCMDI can decide on what kind of browsing/clicking you would like the  
users to go through, and let us know ?

thanks, Luca


On Nov 5, 2009, at 12:53 PM, Bob Drach wrote:

> Hi Luca,
>
> On Nov 5, 2009, at 10:57 AM, Luca Cinquini wrote:
>
>> Hi Bob,
>> 	how you build the dataset hierarchy really boils down on how you  
>> want users to browse. I was under the impression that you wanted  
>> users to browse the catalogs reflecting how the data was stored on  
>> disk, but maybe I was wrong.
>
> The browsing should be organized for user convenience - as few  
> clicks as necessary. If the browsing hierarchy is decoupled from the  
> organization on disk, then the disk hierarchy can be arranged for  
> convenience of publication as well. This is particularly useful for  
> publication of legacy data, where you don't necessarily have control  
> over the disk organization. So no, I don't think the gateway  
> browsing hierarchy should necessarily mirror the disk organization  
> at the data node.
>
>
>> You don't think it would be too confusing to have all datasets for  
>> a single model/experiment/frequency/realm/variable/ensemble be  
>> contained in the very same HTML page ?
>
> Well yes, I do think that might be confusing. But it would be worse  
> to have to click through nine level of hierarchy to find a dataset.  
> Isn't there some intermediate representation that balances the depth  
> of hierarchy with information per page?
>
> For example, the hierarchy might be presented as a table of model  
> vs. experiment, with each table cell containing links to datasets  
> (or at least to a shallower hierarchy). Would that be difficult to do?
>
> Thanks,
>
> Bob
>
>
>> I think for searching we all agree that what needs to be done is  
>> simply harvest all the fields in the database/triple store and then  
>> expose the corresponding facets.
>> thanks, Luca
>>
>> On Nov 5, 2009, at 11:28 AM, Bob Drach wrote:
>>
>>> Hi Luca,
>>>
>>> Thanks for raising the issue - I've been wondering about this too.
>>>
>>> The hierarchy of datasets as presented by the gateway - for users  
>>> to browse through - shouldn't necessarily be the same as the  
>>> hierarchy introduced by DRS. Users should be able to find datasets  
>>> with as few clicks as possible, which is why we just went through  
>>> the exercise of 'flattening' the THREDDS catalogs.
>>>
>>> The publisher already associates properties corresponding to the  
>>> DRS fields (model, experiment, etc.) into the catalogs, with the  
>>> exception of version numbers (which are coming in the next  
>>> release). So here's a way forward:
>>>
>>> - The publisher is configured such that the categories defined for  
>>> the IPCC5/CMIP5 project (activity) include the DRS fields. As I  
>>> said, this is already mostly true. The categories are mandatory -  
>>> must be resolved before publication.
>>> - Each catalog corresponding to a dataset has properties that  
>>> define these values. On publication the gateway ingests these  
>>> values in searchable fashion.
>>> - When the portal receives a DRS request, it parses the URL,  
>>> searches on the resulting fields, and resolves to the  
>>> corresponding dataset.
>>>
>>> The main point is that this can be independent of the dataset  
>>> hierarchy as generated during publication.
>>>
>>> Bob
>>>
>>> On Nov 5, 2009, at 4:50 AM, Luca Cinquini wrote:
>>>
>>>> Hi,
>>>> 	the purpose of this email is to start a conversation, and a plan  
>>>> of
>>>> action, on how to incorporate the DRS syntax into the ESG system.
>>>> As a reminder, the current DRS specification states that a CMIP5
>>>> dataset will be uniquely identified by the following URL:
>>>>
>>>> http://**<hostname>/<activity>/<institute>/<model>/<experiment>/
>>>> <frequency>/<modeling realm>/<variable>/<ensemble member>/ 
>>>> <version>/
>>>> [<endpoint>]
>>>>
>>>> where most of the <...> fields are controlled vocabularies
>>>> for example:
>>>>
>>>> http://**badc.nerc.ac.uk/activity/institute/model/experiment/ 
>>>> frequency/realm/varname/r
>>>> 1/v1/
>>>>
>>>> The first question would be what does it mean to capture the  
>>>> semantics
>>>> of the DRS syntax within ESG ? I can see at least two answers:
>>>>
>>>> a) The user is able to browse the CMIP5 datasets hierarchically
>>>> according to the DRS hierarchy of fields
>>>> b) The user is able to search for data based on facets that reflect
>>>> the DRS syntax: activity, institute, experiment, etc..
>>>>
>>>> So how do we get there ? A straw-man workflow could be the  
>>>> following:
>>>>
>>>> o) The ESG Data Node publishing client, when building the THREDDS
>>>> catalogs, creates a hierarchy of datasets that reflects the syntax.
>>>> There is probably also a need to mark up these catalogs as "DRS" or
>>>> "CMIP5".
>>>> o) The ESG Gateway, when parsing these catalogs, invokes a specific
>>>> handler that creates the same datasets hierarchy (this is actually
>>>> automatic, I believe), and additionally associates corresponding
>>>> objects at each level of the hierarchy. For example, at first level
>>>> the dataset will be associated with an activity, at second level  
>>>> with
>>>> an institute, and so on. An alternative way would be to associate  
>>>> all
>>>> the objects only to the leaf level dataset.
>>>> o) When the metadata for the leaf nodes datasets is harvested  
>>>> into RDF
>>>> triples for searching, the dataset - object associations must be
>>>> transfered to the triple store
>>>> o) Specific CMIP5 facets can be configured to search by DRS fields
>>>> (perhaps only on the PCMDI Gateway, or perhaps on all gateways).
>>>>
>>>> As mentioned, this is just a start. I do believe though that this  
>>>> is
>>>> an extremely important issue that must be tackled as soon as  
>>>> possible.
>>>>
>>>> thanks, Luca
>>>>
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>
>>
>>
>



More information about the GO-ESSP-TECH mailing list