[Go-essp-tech] replicas and search

Don Middleton don at ucar.edu
Thu Dec 3 08:53:39 MST 2009


I expect we'll have to deal with datasets with both history files  
(thousands) as well as CMIP/DRS datasets. Has anyone done an estimate  
on how many atomic datasets and files the latter will comprise? I know  
Gary Strand has done a lot of estimating relative to the CCSM side of  
the question - any estimates on datasets/files quantities for that,  
Gary? Offhand, I'd guess the ratio of history-nfiles/drs-nfiles is on  
order 10**2 for a given experiment. The DRS side is far more important  
relative to a good user experience, of course, but we've traditionally  
had to deal with both (an assumption that needs to be questioned? ;-).

don

On Dec 3, 2009, at 8:42 AM, <stephen.pascoe at stfc.ac.uk> wrote:

>
> My understanding is that tracking-id is generated by CMOR for each  
> file.  However, I still think it is useful to be able to answer the  
> question "Which dataset does this file belong to" even if it's been  
> separated from the rest of the dataset.  People will move these  
> files about on their computers, ftp them about, email them to each  
> other, etc.  We can't stop them doing that.
>
> If by dataset we mean atomic datasets, rather than any point in a  
> "dataset hierarchy" then there will be relatively few files per  
> dataset -- maybe a few tens.  In this case I can't imagine there  
> being a performance issue with having a few tens of "hasTrackingId"  
> triples per dataset.  Do you agree?
>
> S.
>
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
>
> -----Original Message-----
> From: Luca Cinquini [mailto:luca at ucar.edu]
> Sent: 03 December 2009 15:29
> To: Pascoe, Stephen (STFC,RAL,SSTD)
> Cc: Lawrence, Bryan (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] replicas and search
>
> Hi Stephen,
> 	I guess the fundamental question is wether the trackingId is file-  
> specific or dataset-specific. It is certainly possible to have a new  
> RDF triple of the form:
> (dataset, hasTrackingId, trackingId)
> but we want to avoid having to start tracking ids for each file of  
> each dataset, since datasets can potentially have many, many files  
> (I assume this is still true?).
> See below for your other related questions.
> thanks, Luca
>
> On Dec 3, 2009, at 4:03 AM, <stephen.pascoe at stfc.ac.uk> <stephen.pascoe at stfc.ac.uk 
>   > wrote:
>
>> Hi Luca,
>>
>> I've taken a look at the ontology in Protégé.  I see there is an
>> esg:Dataset class with some interesting properties:
>>
>> esg:hasUri
> points to the full dataset URI, for example:
> <esg:hasUri rdf:datatype="http://www.w3.org/2001/
> XMLSchema#string">resource://ESG-NCAR/ID#test.dataset.0</esg:hasUri>
>
>> esg:hasUUID
> points to the dataset UUID, which is the database identifier, but we  
> don't currently use it within an RDF context.
>> esg:hasDataArchive
> this property is actually obsolete, replaced by hasGateway
>> esg:hasDataService
> used to point to data access services of any kind, through the  
> DatasetAccess object. Currently the ESG gateway does not store the  
> URL of a THREDDS catalog, but it looks like we will probably have to  
> do it, and if so we would use this property to store it in the RDF  
> triple store.
>>
>> How are these used in ESG?  Do any of them map onto THREDDS URLs of
>> individual datasets?
>> Maybe we could have a property esg:containsFileWithTrackingId?
>>
>> Cheers,
>> Stephen.
>>
>>
>> ---
>> Stephen Pascoe  +44 (0)1235 445980
>> British Atmospheric Data Centre
>> Rutherford Appleton Laboratory
>>
>> -----Original Message-----
>> From: go-essp-tech-bounces at ucar.edu
>> [mailto:go-essp-tech-bounces at ucar.edu
>> ] On Behalf Of Luca Cinquini
>> Sent: 03 December 2009 00:59
>> To: Lawrence, Bryan (STFC,RAL,SSTD)
>> Cc: go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] replicas and search
>>
>> Hi Bryan:
>>
>> On Dec 2, 2009, at 10:20 AM, Bryan Lawrence wrote:
>>
>>> On Wednesday 02 December 2009 16:47:02 Don Middleton wrote:
>>>> A follow-on to the call, as I had to leave a bit early. We were
>>>> discussing being able to use the tracking id for various files. If
>>>> the tracking id is not in the federated metadata, how does one know
>>>> which gateway to consult for information?
>>>
>>> I was wondering the same thing ...
>>>
>>> I think I've asked this question before, but I can't remember the
>>> answer :-)
>>>
>>> Is the difference between what is in the Thredds catalog and what is
>>> in the rdf representation of it documented anywhere?  (Or, where is
>>> the schema for the rdf?)
>>
>> The schema for the RDF is the owl file itself:
>>
>> http://ontologies.ucar.edu/owl/esg.owl
>>
>> or if you want to look at all the ontologies in the package:
>>
>> http://ontologies.ucar.edu/owl/esg_all.owl
>>
>> Luca
>>
>>>
>>> Cheers
>>> Bryan
>>>
>>>>
>>>> don
>>>>
>>>>
>>>> On Dec 1, 2009, at 2:30 PM, Luca Cinquini wrote:
>>>>
>>>>> Hi,
>>>>> 	as I mentioned today at the telecon, I think the RDF query
>>>>> services should be able to handle the concept of dataset replicas
>>>>> quite nicely. The main concepts are that the RDF records generated
>>>>> from replicas must have different RDF identifiers, so they can be
>>>>> exchanged independently among gateways, and must reference their
>>>>> original RDF record.
>>>>>
>>>>> If you are interested, I've documented some of the details here:
>>>>>
>>>>> https://wiki.ucar.edu/display/esgcet/Metadata+Search+and+Replicas
>>>>>
>>>>> I'm also including a snapshot of an ESG Gateway that shows  
>>>>> multiple
>>>>> replicas returned as part of a DRS search (note that how the
>>>>> replicas are presented in the result page can be easily changed -
>>>>> the picture is just to show that the query can handle replicas).
>>>>>
>>>>> thanks, Luca
>>>>>
>>>>> <DRS with Replicas.tiff>
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>
>>>
>>>
>>> --
>>> Bryan Lawrence
>>> Director of Environmental Archival and Associated Research
>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC,
>>> Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ... 5848;
>>> Web: home.badc.rl.ac.uk/lawrence
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> --
>> Scanned by iCritical.
>
> -- 
> Scanned by iCritical.
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech



More information about the GO-ESSP-TECH mailing list