[Go-essp-tech] DRS syntax into ESG

bryan.lawrence at stfc.ac.uk bryan.lawrence at stfc.ac.uk
Fri Nov 6 12:08:10 MST 2009


Hi Luca

>Certainly, the URL structure doesn't necessarily have to reflect the  
>organization on disk.
>It really boils down to how the PCMDI people would like their users to  
>browse their hierarchies,
>and they are going to let us know next week.

Not to put too fine a point on it. It's not just about PCMDI! Of course they
have the final say, someone has to, but CMIP5 is a global effort!

What I outlined before is in fact, a CMIP5, requirement in terms of downloads.

Bob may be able to comb logs to tell you what different folks collected last time
which might help with priorities, and there may indeed be more requirements, 
but we're pretty confident that what I outlined is  a key part of the scientific 
requirement ..  and Stephen is right, ideally one would allow a faceted search
to build up these dataset views.

Bryan

>
> Couldn't faceted search be extended to enable browsing through  
> facets in
> any order?
>
Sure, it does that already.
thanks, L

> S.
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
>
> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu
> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Bryan Lawrence
> Sent: 06 November 2009 05:47
> To: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] DRS syntax into ESG
>
>
> Hi Folks
>
> This is precisely the issue that I was bringing up last Tuesday, which
> is the action for the ESG gateway team on page 5 (thanks for getting  
> it
> going Luca) and which I was discussing with respect to figure 2.
>
> The key issue for me is what is the concept of a dataset in the  
> gateway,
> and can we have multiple views (aka datasets) onto the physical
> heirarchy?
>
> What we want to be able to do is as fast as possible (aka clicks)  
> get a
> download script that gets me (or view the metadata for) *at least*:
> - exactly the atomic dataset I want
> - datasets corresonding to all the collections going up the DRS
> heirarchy  AND
> - atomic datasets for a specific variable for all simulations carried
> out by all models in a specific experiment.
> - all atomic datasets in a specific realm for all  simulations carried
> out by all models in a specific experiment.
>
> So the key is that you need to offer multiple views on the data ...
> which are essentially virtual collections.
>
> (With a view to the future, one *wouldn't* version the virtual
> collections, because at the end of the day, all the virtual  
> collections
> correpond to version controlled atomic datasets, and so they're the  
> only
> thing that need versioning).
>
> Cheers
> Bryan
>
>
> On Thursday 05 November 2009 21:01:39 Bob Drach wrote:
>> OK, I'll discuss it with Dean and Karl, and come up with some ideas.
>> Thanks,
>>
>> Bob
>>
>> On Nov 5, 2009, at 11:59 AM, Luca Cinquini wrote:
>>
>>> Hi Bob,
>>> 	I think we can do pretty much anything, we just need to be clear
> on
>>> what the requirements are. I agree that 9 clicks might be too much,
>>> but maybe 3 or 4 can be a good compromise between speed and
>>> overwhelming results. A matrix is possible too, for example see
> here:
>>>
>>> http://*esg.ucar.edu/browse/viewProject.htm?
>>> projectId=ff3949c8-2008-45c8-8e27-5834f54be50f
>>>
>>> (where now all folders are eventually empty).
>>>
>>> Maybe, since this is mostly a CMIP5 presentation issue, you guys at
>>> PCMDI can decide on what kind of browsing/clicking you would like
>>> the users to go through, and let us know ?
>>>
>>> thanks, Luca
>>>
>>>
>>> On Nov 5, 2009, at 12:53 PM, Bob Drach wrote:
>>>
>>>> Hi Luca,
>>>>
>>>> On Nov 5, 2009, at 10:57 AM, Luca Cinquini wrote:
>>>>
>>>>> Hi Bob,
>>>>> 	how you build the dataset hierarchy really boils down on how you
>
>>>>> want users to browse. I was under the impression that you wanted
>>>>> users to browse the catalogs reflecting how the data was stored on
>
>>>>> disk, but maybe I was wrong.
>>>>
>>>> The browsing should be organized for user convenience - as few
>>>> clicks as necessary. If the browsing hierarchy is decoupled from
>>>> the organization on disk, then the disk hierarchy can be arranged
>>>> for convenience of publication as well. This is particularly useful
>
>>>> for publication of legacy data, where you don't necessarily have
>>>> control over the disk organization. So no, I don't think the
>>>> gateway browsing hierarchy should necessarily mirror the disk
>>>> organization at the data node.
>>>>
>>>>
>>>>> You don't think it would be too confusing to have all datasets for
>
>>>>> a single model/experiment/frequency/realm/variable/ensemble
>>>>> be contained in the very same HTML page ?
>>>>
>>>> Well yes, I do think that might be confusing. But it would be worse
>
>>>> to have to click through nine level of hierarchy to find a dataset.
>
>>>> Isn't there some intermediate representation that balances the
>>>> depth of hierarchy with information per page?
>>>>
>>>> For example, the hierarchy might be presented as a table of model
>>>> vs. experiment, with each table cell containing links to datasets
>>>> (or at least to a shallower hierarchy). Would that be difficult to
>>>> do?
>>>>
>>>> Thanks,
>>>>
>>>> Bob
>>>>
>>>>
>>>>> I think for searching we all agree that what needs to be done is
>>>>> simply harvest all the fields in the database/triple store and
>>>>> then expose the corresponding facets.
>>>>> thanks, Luca
>>>>>
>>>>> On Nov 5, 2009, at 11:28 AM, Bob Drach wrote:
>>>>>
>>>>>> Hi Luca,
>>>>>>
>>>>>> Thanks for raising the issue - I've been wondering about this
> too.
>>>>>>
>>>>>> The hierarchy of datasets as presented by the gateway - for users
>
>>>>>> to browse through - shouldn't necessarily be the same as the
>>>>>> hierarchy introduced by DRS. Users should be able to find
>>>>>> datasets with as few clicks as possible, which is why we just
>>>>>> went through the exercise of 'flattening' the THREDDS catalogs.
>>>>>>
>>>>>> The publisher already associates properties corresponding to the
>>>>>> DRS fields (model, experiment, etc.) into the catalogs, with the
>>>>>> exception of version numbers (which are coming in the next
>>>>>> release). So here's a way forward:
>>>>>>
>>>>>> - The publisher is configured such that the categories defined
>>>>>> for the IPCC5/CMIP5 project (activity) include the DRS fields.
>>>>>> As I said, this is already mostly true. The categories are
>>>>>> mandatory - must be resolved before publication.
>>>>>> - Each catalog corresponding to a dataset has properties that
>>>>>> define these values. On publication the gateway ingests these
>>>>>> values in searchable fashion.
>>>>>> - When the portal receives a DRS request, it parses the URL,
>>>>>> searches on the resulting fields, and resolves to the
>>>>>> corresponding dataset.
>>>>>>
>>>>>> The main point is that this can be independent of the dataset
>>>>>> hierarchy as generated during publication.
>>>>>>
>>>>>> Bob
>>>>>>
>>>>>> On Nov 5, 2009, at 4:50 AM, Luca Cinquini wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> 	the purpose of this email is to start a conversation,
> and a
>>>>>>> plan of action, on how to incorporate the DRS syntax into the
>>>>>>> ESG system.
>>>>>>> As a reminder, the current DRS specification states that a CMIP5
>
>>>>>>> dataset will be uniquely identified by the following URL:
>>>>>>>
>>>>>>> http://***<hostname>/<activity>/<institute>/<model>/<experiment>
>>>>>>> / <frequency>/<modeling realm>/<variable>/<ensemble member>/
>>>>>>> <version>/ [<endpoint>]
>>>>>>>
>>>>>>> where most of the <...> fields are controlled vocabularies for
>>>>>>> example:
>>>>>>>
>>>>>>> http://***badc.nerc.ac.uk/activity/institute/model/experiment/
>>>>>>> frequency/realm/varname/r
>>>>>>> 1/v1/
>>>>>>>
>>>>>>> The first question would be what does it mean to capture the
>>>>>>> semantics of the DRS syntax within ESG ? I can see at least two
>>>>>>> answers:
>>>>>>>
>>>>>>> a) The user is able to browse the CMIP5 datasets hierarchically
>>>>>>> according to the DRS hierarchy of fields
>>>>>>> b) The user is able to search for data based on facets that
>>>>>>> reflect the DRS syntax: activity, institute, experiment, etc..
>>>>>>>
>>>>>>> So how do we get there ? A straw-man workflow could be the
>>>>>>> following:
>>>>>>>
>>>>>>> o) The ESG Data Node publishing client, when building the
>>>>>>> THREDDS catalogs, creates a hierarchy of datasets that reflects
>>>>>>> the syntax.
>>>>>>> There is probably also a need to mark up these catalogs as "DRS"
>
>>>>>>> or "CMIP5".
>>>>>>> o) The ESG Gateway, when parsing these catalogs, invokes a
>>>>>>> specific handler that creates the same datasets hierarchy (this
>>>>>>> is actually automatic, I believe), and additionally associates
>>>>>>> corresponding objects at each level of the hierarchy. For
>>>>>>> example, at first level the dataset will be associated with an
>>>>>>> activity, at second level with an institute, and so on. An
>>>>>>> alternative way would be to associate all the objects only to
>>>>>>> the leaf level dataset.
>>>>>>> o) When the metadata for the leaf nodes datasets is harvested
>>>>>>> into RDF triples for searching, the dataset - object
>>>>>>> associations must be transfered to the triple store
>>>>>>> o) Specific CMIP5 facets can be configured to search by DRS
>>>>>>> fields (perhaps only on the PCMDI Gateway, or perhaps on all
> gateways).
>>>>>>>
>>>>>>> As mentioned, this is just a start. I do believe though that
>>>>>>> this is an extremely important issue that must be tackled as
>>>>>>> soon as possible.
>>>>>>>
>>>>>>> thanks, Luca
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> GO-ESSP-TECH mailing list
>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>> http://***mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>
>
>
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research (NCAS/ 
> British
> Atmospheric Data Centre and NCEO/NERC NEODC) STFC, Rutherford Appleton
> Laboratory Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> -- 
> Scanned by iCritical.
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech


-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list