[Go-essp-tech] DRS syntax into ESG
Luca Cinquini
luca at ucar.edu
Thu Nov 5 12:59:29 MST 2009
Hi Bob,
I think we can do pretty much anything, we just need to be clear on
what the requirements are. I agree that 9 clicks might be too much,
but maybe 3 or 4 can be a good compromise between speed and
overwhelming results. A matrix is possible too, for example see here:
http://esg.ucar.edu/browse/viewProject.htm?projectId=ff3949c8-2008-45c8-8e27-5834f54be50f
(where now all folders are eventually empty).
Maybe, since this is mostly a CMIP5 presentation issue, you guys at
PCMDI can decide on what kind of browsing/clicking you would like the
users to go through, and let us know ?
thanks, Luca
On Nov 5, 2009, at 12:53 PM, Bob Drach wrote:
> Hi Luca,
>
> On Nov 5, 2009, at 10:57 AM, Luca Cinquini wrote:
>
>> Hi Bob,
>> how you build the dataset hierarchy really boils down on how you
>> want users to browse. I was under the impression that you wanted
>> users to browse the catalogs reflecting how the data was stored on
>> disk, but maybe I was wrong.
>
> The browsing should be organized for user convenience - as few
> clicks as necessary. If the browsing hierarchy is decoupled from the
> organization on disk, then the disk hierarchy can be arranged for
> convenience of publication as well. This is particularly useful for
> publication of legacy data, where you don't necessarily have control
> over the disk organization. So no, I don't think the gateway
> browsing hierarchy should necessarily mirror the disk organization
> at the data node.
>
>
>> You don't think it would be too confusing to have all datasets for
>> a single model/experiment/frequency/realm/variable/ensemble be
>> contained in the very same HTML page ?
>
> Well yes, I do think that might be confusing. But it would be worse
> to have to click through nine level of hierarchy to find a dataset.
> Isn't there some intermediate representation that balances the depth
> of hierarchy with information per page?
>
> For example, the hierarchy might be presented as a table of model
> vs. experiment, with each table cell containing links to datasets
> (or at least to a shallower hierarchy). Would that be difficult to do?
>
> Thanks,
>
> Bob
>
>
>> I think for searching we all agree that what needs to be done is
>> simply harvest all the fields in the database/triple store and then
>> expose the corresponding facets.
>> thanks, Luca
>>
>> On Nov 5, 2009, at 11:28 AM, Bob Drach wrote:
>>
>>> Hi Luca,
>>>
>>> Thanks for raising the issue - I've been wondering about this too.
>>>
>>> The hierarchy of datasets as presented by the gateway - for users
>>> to browse through - shouldn't necessarily be the same as the
>>> hierarchy introduced by DRS. Users should be able to find datasets
>>> with as few clicks as possible, which is why we just went through
>>> the exercise of 'flattening' the THREDDS catalogs.
>>>
>>> The publisher already associates properties corresponding to the
>>> DRS fields (model, experiment, etc.) into the catalogs, with the
>>> exception of version numbers (which are coming in the next
>>> release). So here's a way forward:
>>>
>>> - The publisher is configured such that the categories defined for
>>> the IPCC5/CMIP5 project (activity) include the DRS fields. As I
>>> said, this is already mostly true. The categories are mandatory -
>>> must be resolved before publication.
>>> - Each catalog corresponding to a dataset has properties that
>>> define these values. On publication the gateway ingests these
>>> values in searchable fashion.
>>> - When the portal receives a DRS request, it parses the URL,
>>> searches on the resulting fields, and resolves to the
>>> corresponding dataset.
>>>
>>> The main point is that this can be independent of the dataset
>>> hierarchy as generated during publication.
>>>
>>> Bob
>>>
>>> On Nov 5, 2009, at 4:50 AM, Luca Cinquini wrote:
>>>
>>>> Hi,
>>>> the purpose of this email is to start a conversation, and a plan
>>>> of
>>>> action, on how to incorporate the DRS syntax into the ESG system.
>>>> As a reminder, the current DRS specification states that a CMIP5
>>>> dataset will be uniquely identified by the following URL:
>>>>
>>>> http://**<hostname>/<activity>/<institute>/<model>/<experiment>/
>>>> <frequency>/<modeling realm>/<variable>/<ensemble member>/
>>>> <version>/
>>>> [<endpoint>]
>>>>
>>>> where most of the <...> fields are controlled vocabularies
>>>> for example:
>>>>
>>>> http://**badc.nerc.ac.uk/activity/institute/model/experiment/
>>>> frequency/realm/varname/r
>>>> 1/v1/
>>>>
>>>> The first question would be what does it mean to capture the
>>>> semantics
>>>> of the DRS syntax within ESG ? I can see at least two answers:
>>>>
>>>> a) The user is able to browse the CMIP5 datasets hierarchically
>>>> according to the DRS hierarchy of fields
>>>> b) The user is able to search for data based on facets that reflect
>>>> the DRS syntax: activity, institute, experiment, etc..
>>>>
>>>> So how do we get there ? A straw-man workflow could be the
>>>> following:
>>>>
>>>> o) The ESG Data Node publishing client, when building the THREDDS
>>>> catalogs, creates a hierarchy of datasets that reflects the syntax.
>>>> There is probably also a need to mark up these catalogs as "DRS" or
>>>> "CMIP5".
>>>> o) The ESG Gateway, when parsing these catalogs, invokes a specific
>>>> handler that creates the same datasets hierarchy (this is actually
>>>> automatic, I believe), and additionally associates corresponding
>>>> objects at each level of the hierarchy. For example, at first level
>>>> the dataset will be associated with an activity, at second level
>>>> with
>>>> an institute, and so on. An alternative way would be to associate
>>>> all
>>>> the objects only to the leaf level dataset.
>>>> o) When the metadata for the leaf nodes datasets is harvested
>>>> into RDF
>>>> triples for searching, the dataset - object associations must be
>>>> transfered to the triple store
>>>> o) Specific CMIP5 facets can be configured to search by DRS fields
>>>> (perhaps only on the PCMDI Gateway, or perhaps on all gateways).
>>>>
>>>> As mentioned, this is just a start. I do believe though that this
>>>> is
>>>> an extremely important issue that must be tackled as soon as
>>>> possible.
>>>>
>>>> thanks, Luca
>>>>
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>
>>
>>
>
More information about the GO-ESSP-TECH
mailing list