[Go-essp-tech] DRS syntax into ESG
Bob Drach
drach at llnl.gov
Thu Nov 5 14:01:39 MST 2009
OK, I'll discuss it with Dean and Karl, and come up with some ideas.
Thanks,
Bob
On Nov 5, 2009, at 11:59 AM, Luca Cinquini wrote:
> Hi Bob,
> I think we can do pretty much anything, we just need to be clear
> on what the requirements are. I agree that 9 clicks might be too
> much, but maybe 3 or 4 can be a good compromise between speed and
> overwhelming results. A matrix is possible too, for example see here:
>
> http://*esg.ucar.edu/browse/viewProject.htm?
> projectId=ff3949c8-2008-45c8-8e27-5834f54be50f
>
> (where now all folders are eventually empty).
>
> Maybe, since this is mostly a CMIP5 presentation issue, you guys at
> PCMDI can decide on what kind of browsing/clicking you would like
> the users to go through, and let us know ?
>
> thanks, Luca
>
>
> On Nov 5, 2009, at 12:53 PM, Bob Drach wrote:
>
>> Hi Luca,
>>
>> On Nov 5, 2009, at 10:57 AM, Luca Cinquini wrote:
>>
>>> Hi Bob,
>>> how you build the dataset hierarchy really boils down on how you
>>> want users to browse. I was under the impression that you wanted
>>> users to browse the catalogs reflecting how the data was stored
>>> on disk, but maybe I was wrong.
>>
>> The browsing should be organized for user convenience - as few
>> clicks as necessary. If the browsing hierarchy is decoupled from
>> the organization on disk, then the disk hierarchy can be arranged
>> for convenience of publication as well. This is particularly
>> useful for publication of legacy data, where you don't necessarily
>> have control over the disk organization. So no, I don't think the
>> gateway browsing hierarchy should necessarily mirror the disk
>> organization at the data node.
>>
>>
>>> You don't think it would be too confusing to have all datasets
>>> for a single model/experiment/frequency/realm/variable/ensemble
>>> be contained in the very same HTML page ?
>>
>> Well yes, I do think that might be confusing. But it would be
>> worse to have to click through nine level of hierarchy to find a
>> dataset. Isn't there some intermediate representation that
>> balances the depth of hierarchy with information per page?
>>
>> For example, the hierarchy might be presented as a table of model
>> vs. experiment, with each table cell containing links to datasets
>> (or at least to a shallower hierarchy). Would that be difficult to
>> do?
>>
>> Thanks,
>>
>> Bob
>>
>>
>>> I think for searching we all agree that what needs to be done is
>>> simply harvest all the fields in the database/triple store and
>>> then expose the corresponding facets.
>>> thanks, Luca
>>>
>>> On Nov 5, 2009, at 11:28 AM, Bob Drach wrote:
>>>
>>>> Hi Luca,
>>>>
>>>> Thanks for raising the issue - I've been wondering about this too.
>>>>
>>>> The hierarchy of datasets as presented by the gateway - for
>>>> users to browse through - shouldn't necessarily be the same as
>>>> the hierarchy introduced by DRS. Users should be able to find
>>>> datasets with as few clicks as possible, which is why we just
>>>> went through the exercise of 'flattening' the THREDDS catalogs.
>>>>
>>>> The publisher already associates properties corresponding to the
>>>> DRS fields (model, experiment, etc.) into the catalogs, with the
>>>> exception of version numbers (which are coming in the next
>>>> release). So here's a way forward:
>>>>
>>>> - The publisher is configured such that the categories defined
>>>> for the IPCC5/CMIP5 project (activity) include the DRS fields.
>>>> As I said, this is already mostly true. The categories are
>>>> mandatory - must be resolved before publication.
>>>> - Each catalog corresponding to a dataset has properties that
>>>> define these values. On publication the gateway ingests these
>>>> values in searchable fashion.
>>>> - When the portal receives a DRS request, it parses the URL,
>>>> searches on the resulting fields, and resolves to the
>>>> corresponding dataset.
>>>>
>>>> The main point is that this can be independent of the dataset
>>>> hierarchy as generated during publication.
>>>>
>>>> Bob
>>>>
>>>> On Nov 5, 2009, at 4:50 AM, Luca Cinquini wrote:
>>>>
>>>>> Hi,
>>>>> the purpose of this email is to start a conversation, and a
>>>>> plan of
>>>>> action, on how to incorporate the DRS syntax into the ESG system.
>>>>> As a reminder, the current DRS specification states that a CMIP5
>>>>> dataset will be uniquely identified by the following URL:
>>>>>
>>>>> http://***<hostname>/<activity>/<institute>/<model>/<experiment>/
>>>>> <frequency>/<modeling realm>/<variable>/<ensemble member>/
>>>>> <version>/
>>>>> [<endpoint>]
>>>>>
>>>>> where most of the <...> fields are controlled vocabularies
>>>>> for example:
>>>>>
>>>>> http://***badc.nerc.ac.uk/activity/institute/model/experiment/
>>>>> frequency/realm/varname/r
>>>>> 1/v1/
>>>>>
>>>>> The first question would be what does it mean to capture the
>>>>> semantics
>>>>> of the DRS syntax within ESG ? I can see at least two answers:
>>>>>
>>>>> a) The user is able to browse the CMIP5 datasets hierarchically
>>>>> according to the DRS hierarchy of fields
>>>>> b) The user is able to search for data based on facets that
>>>>> reflect
>>>>> the DRS syntax: activity, institute, experiment, etc..
>>>>>
>>>>> So how do we get there ? A straw-man workflow could be the
>>>>> following:
>>>>>
>>>>> o) The ESG Data Node publishing client, when building the THREDDS
>>>>> catalogs, creates a hierarchy of datasets that reflects the
>>>>> syntax.
>>>>> There is probably also a need to mark up these catalogs as
>>>>> "DRS" or
>>>>> "CMIP5".
>>>>> o) The ESG Gateway, when parsing these catalogs, invokes a
>>>>> specific
>>>>> handler that creates the same datasets hierarchy (this is actually
>>>>> automatic, I believe), and additionally associates corresponding
>>>>> objects at each level of the hierarchy. For example, at first
>>>>> level
>>>>> the dataset will be associated with an activity, at second
>>>>> level with
>>>>> an institute, and so on. An alternative way would be to
>>>>> associate all
>>>>> the objects only to the leaf level dataset.
>>>>> o) When the metadata for the leaf nodes datasets is harvested
>>>>> into RDF
>>>>> triples for searching, the dataset - object associations must be
>>>>> transfered to the triple store
>>>>> o) Specific CMIP5 facets can be configured to search by DRS fields
>>>>> (perhaps only on the PCMDI Gateway, or perhaps on all gateways).
>>>>>
>>>>> As mentioned, this is just a start. I do believe though that
>>>>> this is
>>>>> an extremely important issue that must be tackled as soon as
>>>>> possible.
>>>>>
>>>>> thanks, Luca
>>>>>
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://***mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>
>>>>
>>>
>>>
>>
>
>
More information about the GO-ESSP-TECH
mailing list