[Go-essp-tech] DRS syntax into ESG

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Fri Nov 6 07:21:29 MST 2009


I would contend that at the interface level the DRS isn't anything to do
with a hierarchy.  The DRS document doesn't mention "hierarchy" or
"tree" anywhere (although it does mention "path").  It defines a set of
components (aka properties or attributes) with controlled vocabularies
and an order in which those components should be encoded in a URL
syntax.  

Section 3.2 does talk about a filesystem layout of the DRS but I think
this is really an implementation detail.  URL schemes don't have to map
directly onto the filesystem.  However, since our current tools
(THREDDS, GridFTP and presumably the Gateway) do have the concept of a
hierarchy hard-wired into them we do have to work out how to make them
work with the *non-hierarchal* DRS.

Couldn't faceted search be extended to enable browsing through facets in
any order?

S.

---
Stephen Pascoe  +44 (0)1235 445980
British Atmospheric Data Centre
Rutherford Appleton Laboratory

-----Original Message-----
From: go-essp-tech-bounces at ucar.edu
[mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Bryan Lawrence
Sent: 06 November 2009 05:47
To: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] DRS syntax into ESG


Hi Folks

This is precisely the issue that I was bringing up last Tuesday, which
is the action for the ESG gateway team on page 5 (thanks for getting it
going Luca) and which I was discussing with respect to figure 2.

The key issue for me is what is the concept of a dataset in the gateway,
and can we have multiple views (aka datasets) onto the physical
heirarchy?

What we want to be able to do is as fast as possible (aka clicks) get a
download script that gets me (or view the metadata for) *at least*:
- exactly the atomic dataset I want
 - datasets corresonding to all the collections going up the DRS
heirarchy  AND
 - atomic datasets for a specific variable for all simulations carried
out by all models in a specific experiment.
 - all atomic datasets in a specific realm for all  simulations carried
out by all models in a specific experiment.

So the key is that you need to offer multiple views on the data ...
which are essentially virtual collections.

(With a view to the future, one *wouldn't* version the virtual
collections, because at the end of the day, all the virtual collections
correpond to version controlled atomic datasets, and so they're the only
thing that need versioning).

Cheers
Bryan


On Thursday 05 November 2009 21:01:39 Bob Drach wrote:
> OK, I'll discuss it with Dean and Karl, and come up with some ideas.  
> Thanks,
> 
> Bob
> 
> On Nov 5, 2009, at 11:59 AM, Luca Cinquini wrote:
> 
> > Hi Bob,
> > 	I think we can do pretty much anything, we just need to be clear
on 
> > what the requirements are. I agree that 9 clicks might be too much, 
> > but maybe 3 or 4 can be a good compromise between speed and 
> > overwhelming results. A matrix is possible too, for example see
here:
> >
> > http://*esg.ucar.edu/browse/viewProject.htm? 
> > projectId=ff3949c8-2008-45c8-8e27-5834f54be50f
> >
> > (where now all folders are eventually empty).
> >
> > Maybe, since this is mostly a CMIP5 presentation issue, you guys at 
> > PCMDI can decide on what kind of browsing/clicking you would like 
> > the users to go through, and let us know ?
> >
> > thanks, Luca
> >
> >
> > On Nov 5, 2009, at 12:53 PM, Bob Drach wrote:
> >
> >> Hi Luca,
> >>
> >> On Nov 5, 2009, at 10:57 AM, Luca Cinquini wrote:
> >>
> >>> Hi Bob,
> >>> 	how you build the dataset hierarchy really boils down on how you

> >>> want users to browse. I was under the impression that you wanted 
> >>> users to browse the catalogs reflecting how the data was stored on

> >>> disk, but maybe I was wrong.
> >>
> >> The browsing should be organized for user convenience - as few 
> >> clicks as necessary. If the browsing hierarchy is decoupled from 
> >> the organization on disk, then the disk hierarchy can be arranged 
> >> for convenience of publication as well. This is particularly useful

> >> for publication of legacy data, where you don't necessarily have 
> >> control over the disk organization. So no, I don't think the 
> >> gateway browsing hierarchy should necessarily mirror the disk 
> >> organization at the data node.
> >>
> >>
> >>> You don't think it would be too confusing to have all datasets for

> >>> a single model/experiment/frequency/realm/variable/ensemble
> >>> be contained in the very same HTML page ?
> >>
> >> Well yes, I do think that might be confusing. But it would be worse

> >> to have to click through nine level of hierarchy to find a dataset.

> >> Isn't there some intermediate representation that balances the 
> >> depth of hierarchy with information per page?
> >>
> >> For example, the hierarchy might be presented as a table of model 
> >> vs. experiment, with each table cell containing links to datasets 
> >> (or at least to a shallower hierarchy). Would that be difficult to 
> >> do?
> >>
> >> Thanks,
> >>
> >> Bob
> >>
> >>
> >>> I think for searching we all agree that what needs to be done is 
> >>> simply harvest all the fields in the database/triple store and 
> >>> then expose the corresponding facets.
> >>> thanks, Luca
> >>>
> >>> On Nov 5, 2009, at 11:28 AM, Bob Drach wrote:
> >>>
> >>>> Hi Luca,
> >>>>
> >>>> Thanks for raising the issue - I've been wondering about this
too.
> >>>>
> >>>> The hierarchy of datasets as presented by the gateway - for users

> >>>> to browse through - shouldn't necessarily be the same as the 
> >>>> hierarchy introduced by DRS. Users should be able to find 
> >>>> datasets with as few clicks as possible, which is why we just 
> >>>> went through the exercise of 'flattening' the THREDDS catalogs.
> >>>>
> >>>> The publisher already associates properties corresponding to the 
> >>>> DRS fields (model, experiment, etc.) into the catalogs, with the 
> >>>> exception of version numbers (which are coming in the next 
> >>>> release). So here's a way forward:
> >>>>
> >>>> - The publisher is configured such that the categories defined 
> >>>> for the IPCC5/CMIP5 project (activity) include the DRS fields.
> >>>> As I said, this is already mostly true. The categories are 
> >>>> mandatory - must be resolved before publication.
> >>>> - Each catalog corresponding to a dataset has properties that 
> >>>> define these values. On publication the gateway ingests these 
> >>>> values in searchable fashion.
> >>>> - When the portal receives a DRS request, it parses the URL, 
> >>>> searches on the resulting fields, and resolves to the 
> >>>> corresponding dataset.
> >>>>
> >>>> The main point is that this can be independent of the dataset 
> >>>> hierarchy as generated during publication.
> >>>>
> >>>> Bob
> >>>>
> >>>> On Nov 5, 2009, at 4:50 AM, Luca Cinquini wrote:
> >>>>
> >>>>> Hi,
> >>>>> 	the purpose of this email is to start a conversation,
and a 
> >>>>> plan of action, on how to incorporate the DRS syntax into the 
> >>>>> ESG system.
> >>>>> As a reminder, the current DRS specification states that a CMIP5

> >>>>> dataset will be uniquely identified by the following URL:
> >>>>>
> >>>>> http://***<hostname>/<activity>/<institute>/<model>/<experiment>
> >>>>> / <frequency>/<modeling realm>/<variable>/<ensemble member>/ 
> >>>>> <version>/ [<endpoint>]
> >>>>>
> >>>>> where most of the <...> fields are controlled vocabularies for 
> >>>>> example:
> >>>>>
> >>>>> http://***badc.nerc.ac.uk/activity/institute/model/experiment/
> >>>>> frequency/realm/varname/r
> >>>>> 1/v1/
> >>>>>
> >>>>> The first question would be what does it mean to capture the 
> >>>>> semantics of the DRS syntax within ESG ? I can see at least two 
> >>>>> answers:
> >>>>>
> >>>>> a) The user is able to browse the CMIP5 datasets hierarchically 
> >>>>> according to the DRS hierarchy of fields
> >>>>> b) The user is able to search for data based on facets that 
> >>>>> reflect the DRS syntax: activity, institute, experiment, etc..
> >>>>>
> >>>>> So how do we get there ? A straw-man workflow could be the
> >>>>> following:
> >>>>>
> >>>>> o) The ESG Data Node publishing client, when building the 
> >>>>> THREDDS catalogs, creates a hierarchy of datasets that reflects 
> >>>>> the syntax.
> >>>>> There is probably also a need to mark up these catalogs as "DRS"

> >>>>> or "CMIP5".
> >>>>> o) The ESG Gateway, when parsing these catalogs, invokes a 
> >>>>> specific handler that creates the same datasets hierarchy (this 
> >>>>> is actually automatic, I believe), and additionally associates 
> >>>>> corresponding objects at each level of the hierarchy. For 
> >>>>> example, at first level the dataset will be associated with an 
> >>>>> activity, at second level with an institute, and so on. An 
> >>>>> alternative way would be to associate all the objects only to 
> >>>>> the leaf level dataset.
> >>>>> o) When the metadata for the leaf nodes datasets is harvested 
> >>>>> into RDF triples for searching, the dataset - object 
> >>>>> associations must be transfered to the triple store
> >>>>> o) Specific CMIP5 facets can be configured to search by DRS 
> >>>>> fields (perhaps only on the PCMDI Gateway, or perhaps on all
gateways).
> >>>>>
> >>>>> As mentioned, this is just a start. I do believe though that 
> >>>>> this is an extremely important issue that must be tackled as 
> >>>>> soon as possible.
> >>>>>
> >>>>> thanks, Luca
> >>>>>
> >>>>> _______________________________________________
> >>>>> GO-ESSP-TECH mailing list
> >>>>> GO-ESSP-TECH at ucar.edu
> >>>>> http://***mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
> 
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> 



--
Bryan Lawrence
Director of Environmental Archival and Associated Research (NCAS/British
Atmospheric Data Centre and NCEO/NERC NEODC) STFC, Rutherford Appleton
Laboratory Phone +44 1235 445012; Fax ... 5848;
Web: home.badc.rl.ac.uk/lawrence
_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list