[Go-essp-tech] replicas and search
stephen.pascoe at stfc.ac.uk
stephen.pascoe at stfc.ac.uk
Thu Dec 3 08:57:30 MST 2009
Hmmm,
If you can't handle a doubling of the size of the triple store can you handle the number of datasets CMIP5 is likely to produce? There will be a dataset for each combination of institute/model/experiment/frequency/realm/variable/run/version.
(Just seen Don's email that asks something similar)
S.
---
Stephen Pascoe +44 (0)1235 445980
British Atmospheric Data Centre
Rutherford Appleton Laboratory
-----Original Message-----
From: Luca Cinquini [mailto:luca at ucar.edu]
Sent: 03 December 2009 15:45
To: Pascoe, Stephen (STFC,RAL,SSTD)
Cc: Lawrence, Bryan (STFC,RAL,SSTD); go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] replicas and search
On Dec 3, 2009, at 8:42 AM, <stephen.pascoe at stfc.ac.uk> wrote:
>
> My understanding is that tracking-id is generated by CMOR for each
> file. However, I still think it is useful to be able to answer the
> question "Which dataset does this file belong to" even if it's been
> separated from the rest of the dataset. People will move these files
> about on their computers, ftp them about, email them to each other,
> etc. We can't stop them doing that.
>
> If by dataset we mean atomic datasets, rather than any point in a
> "dataset hierarchy" then there will be relatively few files per
> dataset -- maybe a few tens. In this case I can't imagine there being
> a performance issue with having a few tens of "hasTrackingId"
> triples per dataset. Do you agree?
Not really... storing a few tens more triples per dataset could double the size of the triple store. I guess we would need to try. If all the tracking ids are based on some format convention, maybe we can assign a common tracking id to the dataset...
L
>
> S.
>
>
> ---
> Stephen Pascoe +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
>
> -----Original Message-----
> From: Luca Cinquini [mailto:luca at ucar.edu]
> Sent: 03 December 2009 15:29
> To: Pascoe, Stephen (STFC,RAL,SSTD)
> Cc: Lawrence, Bryan (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] replicas and search
>
> Hi Stephen,
> I guess the fundamental question is wether the trackingId is file-
> specific or dataset-specific. It is certainly possible to have a new
> RDF triple of the form:
> (dataset, hasTrackingId, trackingId)
> but we want to avoid having to start tracking ids for each file of
> each dataset, since datasets can potentially have many, many files (I
> assume this is still true?).
> See below for your other related questions.
> thanks, Luca
>
> On Dec 3, 2009, at 4:03 AM, <stephen.pascoe at stfc.ac.uk> <stephen.pascoe at stfc.ac.uk
> > wrote:
>
>> Hi Luca,
>>
>> I've taken a look at the ontology in Protégé. I see there is an
>> esg:Dataset class with some interesting properties:
>>
>> esg:hasUri
> points to the full dataset URI, for example:
> <esg:hasUri rdf:datatype="http://www.w3.org/2001/
> XMLSchema#string">resource://ESG-NCAR/ID#test.dataset.0</esg:hasUri>
>
>> esg:hasUUID
> points to the dataset UUID, which is the database identifier, but we
> don't currently use it within an RDF context.
>> esg:hasDataArchive
> this property is actually obsolete, replaced by hasGateway
>> esg:hasDataService
> used to point to data access services of any kind, through the
> DatasetAccess object. Currently the ESG gateway does not store the URL
> of a THREDDS catalog, but it looks like we will probably have to do
> it, and if so we would use this property to store it in the RDF triple
> store.
>>
>> How are these used in ESG? Do any of them map onto THREDDS URLs of
>> individual datasets?
>> Maybe we could have a property esg:containsFileWithTrackingId?
>>
>> Cheers,
>> Stephen.
>>
>>
>> ---
>> Stephen Pascoe +44 (0)1235 445980
>> British Atmospheric Data Centre
>> Rutherford Appleton Laboratory
>>
>> -----Original Message-----
>> From: go-essp-tech-bounces at ucar.edu
>> [mailto:go-essp-tech-bounces at ucar.edu
>> ] On Behalf Of Luca Cinquini
>> Sent: 03 December 2009 00:59
>> To: Lawrence, Bryan (STFC,RAL,SSTD)
>> Cc: go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] replicas and search
>>
>> Hi Bryan:
>>
>> On Dec 2, 2009, at 10:20 AM, Bryan Lawrence wrote:
>>
>>> On Wednesday 02 December 2009 16:47:02 Don Middleton wrote:
>>>> A follow-on to the call, as I had to leave a bit early. We were
>>>> discussing being able to use the tracking id for various files. If
>>>> the tracking id is not in the federated metadata, how does one know
>>>> which gateway to consult for information?
>>>
>>> I was wondering the same thing ...
>>>
>>> I think I've asked this question before, but I can't remember the
>>> answer :-)
>>>
>>> Is the difference between what is in the Thredds catalog and what is
>>> in the rdf representation of it documented anywhere? (Or, where is
>>> the schema for the rdf?)
>>
>> The schema for the RDF is the owl file itself:
>>
>> http://ontologies.ucar.edu/owl/esg.owl
>>
>> or if you want to look at all the ontologies in the package:
>>
>> http://ontologies.ucar.edu/owl/esg_all.owl
>>
>> Luca
>>
>>>
>>> Cheers
>>> Bryan
>>>
>>>>
>>>> don
>>>>
>>>>
>>>> On Dec 1, 2009, at 2:30 PM, Luca Cinquini wrote:
>>>>
>>>>> Hi,
>>>>> as I mentioned today at the telecon, I think the RDF query
>>>>> services should be able to handle the concept of dataset replicas
>>>>> quite nicely. The main concepts are that the RDF records generated
>>>>> from replicas must have different RDF identifiers, so they can be
>>>>> exchanged independently among gateways, and must reference their
>>>>> original RDF record.
>>>>>
>>>>> If you are interested, I've documented some of the details here:
>>>>>
>>>>> https://wiki.ucar.edu/display/esgcet/Metadata+Search+and+Replicas
>>>>>
>>>>> I'm also including a snapshot of an ESG Gateway that shows
>>>>> multiple replicas returned as part of a DRS search (note that how
>>>>> the replicas are presented in the result page can be easily
>>>>> changed - the picture is just to show that the query can handle
>>>>> replicas).
>>>>>
>>>>> thanks, Luca
>>>>>
>>>>> <DRS with Replicas.tiff>
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>
>>>
>>>
>>> --
>>> Bryan Lawrence
>>> Director of Environmental Archival and Associated Research
>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC,
>>> Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ... 5848;
>>> Web: home.badc.rl.ac.uk/lawrence
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> --
>> Scanned by iCritical.
>
> --
> Scanned by iCritical.
--
Scanned by iCritical.
More information about the GO-ESSP-TECH
mailing list