[Go-essp-tech] DRS syntax into ESG

Bob Drach drach at llnl.gov
Mon Nov 16 20:01:36 MST 2009


Hi Luca,

Karl Taylor, Dean, and I came up with a fairly straightforward layout  
for hierarchical browsing of CMIP5 datasets. The path takes three  
clicks to go from Activity/Project to any individual dataset - see  
the attached document.

I should emphasize that the document relates only to:

- CMIP5 project/activity,
- hierarchical browsing (not faceted search).

More discussion to follow on the telecon tomorrow.

Bob

-------------- next part --------------
A non-text attachment was scrubbed...
Name: browsingCMIP5.doc
Type: application/octet-stream
Size: 60416 bytes
Desc: not available
Url : http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20091116/c1ff69f3/attachment-0001.obj 
-------------- next part --------------



On Nov 6, 2009, at 7:35 AM, Luca Cinquini wrote:

> Hi Stephen:
>
> On Nov 6, 2009, at 7:21 AM, <stephen.pascoe at stfc.ac.uk>  
> <stephen.pascoe at stfc.ac.uk
>> wrote:
>
>>
>> I would contend that at the interface level the DRS isn't anything
>> to do
>> with a hierarchy.  The DRS document doesn't mention "hierarchy" or
>> "tree" anywhere (although it does mention "path").  It defines a set
>> of
>> components (aka properties or attributes) with controlled  
>> vocabularies
>> and an order in which those components should be encoded in a URL
>> syntax.
>>
>> Section 3.2 does talk about a filesystem layout of the DRS but I  
>> think
>> this is really an implementation detail.  URL schemes don't have to
>> map
>> directly onto the filesystem.  However, since our current tools
>> (THREDDS, GridFTP and presumably the Gateway) do have the concept  
>> of a
>> hierarchy hard-wired into them we do have to work out how to make  
>> them
>> work with the *non-hierarchal* DRS.
>
> Certainly, the URL structure doesn't necessarily have to reflect the
> organization on disk.
> It really boils down to how the PCMDI people would like their users to
> browse their hierarchies,
> and they are going to let us know next week.
>>
>> Couldn't faceted search be extended to enable browsing through
>> facets in
>> any order?
>>
> Sure, it does that already.
> thanks, L
>
>> S.
>>
>> ---
>> Stephen Pascoe  +44 (0)1235 445980
>> British Atmospheric Data Centre
>> Rutherford Appleton Laboratory
>>
>> -----Original Message-----
>> From: go-essp-tech-bounces at ucar.edu
>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Bryan Lawrence
>> Sent: 06 November 2009 05:47
>> To: go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] DRS syntax into ESG
>>
>>
>> Hi Folks
>>
>> This is precisely the issue that I was bringing up last Tuesday,  
>> which
>> is the action for the ESG gateway team on page 5 (thanks for getting
>> it
>> going Luca) and which I was discussing with respect to figure 2.
>>
>> The key issue for me is what is the concept of a dataset in the
>> gateway,
>> and can we have multiple views (aka datasets) onto the physical
>> heirarchy?
>>
>> What we want to be able to do is as fast as possible (aka clicks)
>> get a
>> download script that gets me (or view the metadata for) *at least*:
>> - exactly the atomic dataset I want
>> - datasets corresonding to all the collections going up the DRS
>> heirarchy  AND
>> - atomic datasets for a specific variable for all simulations carried
>> out by all models in a specific experiment.
>> - all atomic datasets in a specific realm for all  simulations  
>> carried
>> out by all models in a specific experiment.
>>
>> So the key is that you need to offer multiple views on the data ...
>> which are essentially virtual collections.
>>
>> (With a view to the future, one *wouldn't* version the virtual
>> collections, because at the end of the day, all the virtual
>> collections
>> correpond to version controlled atomic datasets, and so they're the
>> only
>> thing that need versioning).
>>
>> Cheers
>> Bryan
>>
>>
>> On Thursday 05 November 2009 21:01:39 Bob Drach wrote:
>>> OK, I'll discuss it with Dean and Karl, and come up with some ideas.
>>> Thanks,
>>>
>>> Bob
>>>
>>> On Nov 5, 2009, at 11:59 AM, Luca Cinquini wrote:
>>>
>>>> Hi Bob,
>>>> 	I think we can do pretty much anything, we just need to be clear
>> on
>>>> what the requirements are. I agree that 9 clicks might be too much,
>>>> but maybe 3 or 4 can be a good compromise between speed and
>>>> overwhelming results. A matrix is possible too, for example see
>> here:
>>>>
>>>> http://**esg.ucar.edu/browse/viewProject.htm?
>>>> projectId=ff3949c8-2008-45c8-8e27-5834f54be50f
>>>>
>>>> (where now all folders are eventually empty).
>>>>
>>>> Maybe, since this is mostly a CMIP5 presentation issue, you guys at
>>>> PCMDI can decide on what kind of browsing/clicking you would like
>>>> the users to go through, and let us know ?
>>>>
>>>> thanks, Luca
>>>>
>>>>
>>>> On Nov 5, 2009, at 12:53 PM, Bob Drach wrote:
>>>>
>>>>> Hi Luca,
>>>>>
>>>>> On Nov 5, 2009, at 10:57 AM, Luca Cinquini wrote:
>>>>>
>>>>>> Hi Bob,
>>>>>> 	how you build the dataset hierarchy really boils down on how you
>>
>>>>>> want users to browse. I was under the impression that you wanted
>>>>>> users to browse the catalogs reflecting how the data was  
>>>>>> stored on
>>
>>>>>> disk, but maybe I was wrong.
>>>>>
>>>>> The browsing should be organized for user convenience - as few
>>>>> clicks as necessary. If the browsing hierarchy is decoupled from
>>>>> the organization on disk, then the disk hierarchy can be arranged
>>>>> for convenience of publication as well. This is particularly  
>>>>> useful
>>
>>>>> for publication of legacy data, where you don't necessarily have
>>>>> control over the disk organization. So no, I don't think the
>>>>> gateway browsing hierarchy should necessarily mirror the disk
>>>>> organization at the data node.
>>>>>
>>>>>
>>>>>> You don't think it would be too confusing to have all datasets  
>>>>>> for
>>
>>>>>> a single model/experiment/frequency/realm/variable/ensemble
>>>>>> be contained in the very same HTML page ?
>>>>>
>>>>> Well yes, I do think that might be confusing. But it would be  
>>>>> worse
>>
>>>>> to have to click through nine level of hierarchy to find a  
>>>>> dataset.
>>
>>>>> Isn't there some intermediate representation that balances the
>>>>> depth of hierarchy with information per page?
>>>>>
>>>>> For example, the hierarchy might be presented as a table of model
>>>>> vs. experiment, with each table cell containing links to datasets
>>>>> (or at least to a shallower hierarchy). Would that be difficult to
>>>>> do?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Bob
>>>>>
>>>>>
>>>>>> I think for searching we all agree that what needs to be done is
>>>>>> simply harvest all the fields in the database/triple store and
>>>>>> then expose the corresponding facets.
>>>>>> thanks, Luca
>>>>>>
>>>>>> On Nov 5, 2009, at 11:28 AM, Bob Drach wrote:
>>>>>>
>>>>>>> Hi Luca,
>>>>>>>
>>>>>>> Thanks for raising the issue - I've been wondering about this
>> too.
>>>>>>>
>>>>>>> The hierarchy of datasets as presented by the gateway - for  
>>>>>>> users
>>
>>>>>>> to browse through - shouldn't necessarily be the same as the
>>>>>>> hierarchy introduced by DRS. Users should be able to find
>>>>>>> datasets with as few clicks as possible, which is why we just
>>>>>>> went through the exercise of 'flattening' the THREDDS catalogs.
>>>>>>>
>>>>>>> The publisher already associates properties corresponding to the
>>>>>>> DRS fields (model, experiment, etc.) into the catalogs, with the
>>>>>>> exception of version numbers (which are coming in the next
>>>>>>> release). So here's a way forward:
>>>>>>>
>>>>>>> - The publisher is configured such that the categories defined
>>>>>>> for the IPCC5/CMIP5 project (activity) include the DRS fields.
>>>>>>> As I said, this is already mostly true. The categories are
>>>>>>> mandatory - must be resolved before publication.
>>>>>>> - Each catalog corresponding to a dataset has properties that
>>>>>>> define these values. On publication the gateway ingests these
>>>>>>> values in searchable fashion.
>>>>>>> - When the portal receives a DRS request, it parses the URL,
>>>>>>> searches on the resulting fields, and resolves to the
>>>>>>> corresponding dataset.
>>>>>>>
>>>>>>> The main point is that this can be independent of the dataset
>>>>>>> hierarchy as generated during publication.
>>>>>>>
>>>>>>> Bob
>>>>>>>
>>>>>>> On Nov 5, 2009, at 4:50 AM, Luca Cinquini wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> 	the purpose of this email is to start a conversation,
>> and a
>>>>>>>> plan of action, on how to incorporate the DRS syntax into the
>>>>>>>> ESG system.
>>>>>>>> As a reminder, the current DRS specification states that a  
>>>>>>>> CMIP5
>>
>>>>>>>> dataset will be uniquely identified by the following URL:
>>>>>>>>
>>>>>>>> http://****<hostname>/<activity>/<institute>/<model>/ 
>>>>>>>> <experiment>
>>>>>>>> / <frequency>/<modeling realm>/<variable>/<ensemble member>/
>>>>>>>> <version>/ [<endpoint>]
>>>>>>>>
>>>>>>>> where most of the <...> fields are controlled vocabularies for
>>>>>>>> example:
>>>>>>>>
>>>>>>>> http://****badc.nerc.ac.uk/activity/institute/model/experiment/
>>>>>>>> frequency/realm/varname/r
>>>>>>>> 1/v1/
>>>>>>>>
>>>>>>>> The first question would be what does it mean to capture the
>>>>>>>> semantics of the DRS syntax within ESG ? I can see at least two
>>>>>>>> answers:
>>>>>>>>
>>>>>>>> a) The user is able to browse the CMIP5 datasets hierarchically
>>>>>>>> according to the DRS hierarchy of fields
>>>>>>>> b) The user is able to search for data based on facets that
>>>>>>>> reflect the DRS syntax: activity, institute, experiment, etc..
>>>>>>>>
>>>>>>>> So how do we get there ? A straw-man workflow could be the
>>>>>>>> following:
>>>>>>>>
>>>>>>>> o) The ESG Data Node publishing client, when building the
>>>>>>>> THREDDS catalogs, creates a hierarchy of datasets that reflects
>>>>>>>> the syntax.
>>>>>>>> There is probably also a need to mark up these catalogs as  
>>>>>>>> "DRS"
>>
>>>>>>>> or "CMIP5".
>>>>>>>> o) The ESG Gateway, when parsing these catalogs, invokes a
>>>>>>>> specific handler that creates the same datasets hierarchy (this
>>>>>>>> is actually automatic, I believe), and additionally associates
>>>>>>>> corresponding objects at each level of the hierarchy. For
>>>>>>>> example, at first level the dataset will be associated with an
>>>>>>>> activity, at second level with an institute, and so on. An
>>>>>>>> alternative way would be to associate all the objects only to
>>>>>>>> the leaf level dataset.
>>>>>>>> o) When the metadata for the leaf nodes datasets is harvested
>>>>>>>> into RDF triples for searching, the dataset - object
>>>>>>>> associations must be transfered to the triple store
>>>>>>>> o) Specific CMIP5 facets can be configured to search by DRS
>>>>>>>> fields (perhaps only on the PCMDI Gateway, or perhaps on all
>> gateways).
>>>>>>>>
>>>>>>>> As mentioned, this is just a start. I do believe though that
>>>>>>>> this is an extremely important issue that must be tackled as
>>>>>>>> soon as possible.
>>>>>>>>
>>>>>>>> thanks, Luca
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>> http://****mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>
>>
>>
>>
>> --
>> Bryan Lawrence
>> Director of Environmental Archival and Associated Research (NCAS/
>> British
>> Atmospheric Data Centre and NCEO/NERC NEODC) STFC, Rutherford  
>> Appleton
>> Laboratory Phone +44 1235 445012; Fax ... 5848;
>> Web: home.badc.rl.ac.uk/lawrence
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> -- 
>> Scanned by iCritical.
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>



More information about the GO-ESSP-TECH mailing list