[Go-essp-tech] replicas and search

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Thu Dec 3 08:57:30 MST 2009


Hmmm,

If you can't handle a doubling of the size of the triple store can you handle the number of datasets CMIP5 is likely to produce?  There will be a dataset for each combination of institute/model/experiment/frequency/realm/variable/run/version.

(Just seen Don's email that asks something similar)

S.


---
Stephen Pascoe  +44 (0)1235 445980
British Atmospheric Data Centre
Rutherford Appleton Laboratory

-----Original Message-----
From: Luca Cinquini [mailto:luca at ucar.edu] 
Sent: 03 December 2009 15:45
To: Pascoe, Stephen (STFC,RAL,SSTD)
Cc: Lawrence, Bryan (STFC,RAL,SSTD); go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] replicas and search


On Dec 3, 2009, at 8:42 AM, <stephen.pascoe at stfc.ac.uk> wrote:

>
> My understanding is that tracking-id is generated by CMOR for each 
> file.  However, I still think it is useful to be able to answer the 
> question "Which dataset does this file belong to" even if it's been 
> separated from the rest of the dataset.  People will move these files 
> about on their computers, ftp them about, email them to each other, 
> etc.  We can't stop them doing that.
>
> If by dataset we mean atomic datasets, rather than any point in a 
> "dataset hierarchy" then there will be relatively few files per 
> dataset -- maybe a few tens.  In this case I can't imagine there being 
> a performance issue with having a few tens of "hasTrackingId"
> triples per dataset.  Do you agree?

Not really... storing a few tens more triples per dataset could double the size of the triple store. I guess we would need to try. If all the tracking ids are based on some format convention, maybe we can assign a common tracking id to the dataset...
L

>
> S.
>
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
>
> -----Original Message-----
> From: Luca Cinquini [mailto:luca at ucar.edu]
> Sent: 03 December 2009 15:29
> To: Pascoe, Stephen (STFC,RAL,SSTD)
> Cc: Lawrence, Bryan (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] replicas and search
>
> Hi Stephen,
> 	I guess the fundamental question is wether the trackingId is file- 
> specific or dataset-specific. It is certainly possible to have a new 
> RDF triple of the form:
> (dataset, hasTrackingId, trackingId)
> but we want to avoid having to start tracking ids for each file of 
> each dataset, since datasets can potentially have many, many files (I 
> assume this is still true?).
> See below for your other related questions.
> thanks, Luca
>
> On Dec 3, 2009, at 4:03 AM, <stephen.pascoe at stfc.ac.uk> <stephen.pascoe at stfc.ac.uk 
>   > wrote:
>
>> Hi Luca,
>>
>> I've taken a look at the ontology in Protégé.  I see there is an 
>> esg:Dataset class with some interesting properties:
>>
>> esg:hasUri
> points to the full dataset URI, for example:
> <esg:hasUri rdf:datatype="http://www.w3.org/2001/
> XMLSchema#string">resource://ESG-NCAR/ID#test.dataset.0</esg:hasUri>
>
>> esg:hasUUID
> points to the dataset UUID, which is the database identifier, but we 
> don't currently use it within an RDF context.
>> esg:hasDataArchive
> this property is actually obsolete, replaced by hasGateway
>> esg:hasDataService
> used to point to data access services of any kind, through the 
> DatasetAccess object. Currently the ESG gateway does not store the URL 
> of a THREDDS catalog, but it looks like we will probably have to do 
> it, and if so we would use this property to store it in the RDF triple 
> store.
>>
>> How are these used in ESG?  Do any of them map onto THREDDS URLs of 
>> individual datasets?
>> Maybe we could have a property esg:containsFileWithTrackingId?
>>
>> Cheers,
>> Stephen.
>>
>>
>> ---
>> Stephen Pascoe  +44 (0)1235 445980
>> British Atmospheric Data Centre
>> Rutherford Appleton Laboratory
>>
>> -----Original Message-----
>> From: go-essp-tech-bounces at ucar.edu
>> [mailto:go-essp-tech-bounces at ucar.edu
>> ] On Behalf Of Luca Cinquini
>> Sent: 03 December 2009 00:59
>> To: Lawrence, Bryan (STFC,RAL,SSTD)
>> Cc: go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] replicas and search
>>
>> Hi Bryan:
>>
>> On Dec 2, 2009, at 10:20 AM, Bryan Lawrence wrote:
>>
>>> On Wednesday 02 December 2009 16:47:02 Don Middleton wrote:
>>>> A follow-on to the call, as I had to leave a bit early. We were 
>>>> discussing being able to use the tracking id for various files. If 
>>>> the tracking id is not in the federated metadata, how does one know 
>>>> which gateway to consult for information?
>>>
>>> I was wondering the same thing ...
>>>
>>> I think I've asked this question before, but I can't remember the 
>>> answer :-)
>>>
>>> Is the difference between what is in the Thredds catalog and what is 
>>> in the rdf representation of it documented anywhere?  (Or, where is 
>>> the schema for the rdf?)
>>
>> The schema for the RDF is the owl file itself:
>>
>> http://ontologies.ucar.edu/owl/esg.owl
>>
>> or if you want to look at all the ontologies in the package:
>>
>> http://ontologies.ucar.edu/owl/esg_all.owl
>>
>> Luca
>>
>>>
>>> Cheers
>>> Bryan
>>>
>>>>
>>>> don
>>>>
>>>>
>>>> On Dec 1, 2009, at 2:30 PM, Luca Cinquini wrote:
>>>>
>>>>> Hi,
>>>>> 	as I mentioned today at the telecon, I think the RDF query 
>>>>> services should be able to handle the concept of dataset replicas 
>>>>> quite nicely. The main concepts are that the RDF records generated 
>>>>> from replicas must have different RDF identifiers, so they can be 
>>>>> exchanged independently among gateways, and must reference their 
>>>>> original RDF record.
>>>>>
>>>>> If you are interested, I've documented some of the details here:
>>>>>
>>>>> https://wiki.ucar.edu/display/esgcet/Metadata+Search+and+Replicas
>>>>>
>>>>> I'm also including a snapshot of an ESG Gateway that shows 
>>>>> multiple replicas returned as part of a DRS search (note that how 
>>>>> the replicas are presented in the result page can be easily 
>>>>> changed - the picture is just to show that the query can handle 
>>>>> replicas).
>>>>>
>>>>> thanks, Luca
>>>>>
>>>>> <DRS with Replicas.tiff>
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>
>>>
>>>
>>> --
>>> Bryan Lawrence
>>> Director of Environmental Archival and Associated Research 
>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC, 
>>> Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ... 5848;
>>> Web: home.badc.rl.ac.uk/lawrence
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> --
>> Scanned by iCritical.
>
> --
> Scanned by iCritical.

-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list