[Go-essp-tech] replicas and search

Gary Strand strandwg at ucar.edu
Thu Dec 3 08:49:44 MST 2009


I've been playing around with CMOR2 and it looks to me like the  
tracking ID assigned to a given netCDF file changes with every write  
of the file, even if the underlying data remains unchanged.

Two writes by CMOR of the same test data, doing an 'ncdump' on both  
and then diffing the ncdumps shows:

64c64
<               cl:history = "2009-12-02T23:05:40Z altered by CMOR:  
replaced missing value flag (1e+28) with standard missing value (1e 
+20)." ;
---
 >               cl:history = "2009-12-02T23:06:29Z altered by CMOR:  
replaced missing value flag (1e+28) with standard missing value (1e 
+20)." ;
77c77
<               :history = "Output from archive/ 
giccm_03_std_2xCO2_2256. 2009-12-02T23:05:40Z CMOR rewrote data to  
comply with CF standards and IPCC Fourth Assessment requirements." ;
---
 >               :history = "Output from archive/ 
giccm_03_std_2xCO2_2256. 2009-12-02T23:06:29Z CMOR rewrote data to  
comply with CF standards and IPCC Fourth Assessment requirements." ;
83c83
<               :creation_date = "2009-12-02T23:05:40Z" ;
---
 >               :creation_date = "2009-12-02T23:06:29Z" ;
91c91
<               :tracking_id = "37228c28-6f2d-427a-a9c4-e46545c2873e" ;
---
 >               :tracking_id = "eb942273-e0c6-42b8-b276-e3cc57ed0c0a" ;

On Thu Dec 03, 2009, at 8:45 AM, Luca Cinquini wrote:

>
> On Dec 3, 2009, at 8:42 AM, <stephen.pascoe at stfc.ac.uk> wrote:
>
>>
>> My understanding is that tracking-id is generated by CMOR for each
>> file.  However, I still think it is useful to be able to answer the
>> question "Which dataset does this file belong to" even if it's been
>> separated from the rest of the dataset.  People will move these
>> files about on their computers, ftp them about, email them to each
>> other, etc.  We can't stop them doing that.
>>
>> If by dataset we mean atomic datasets, rather than any point in a
>> "dataset hierarchy" then there will be relatively few files per
>> dataset -- maybe a few tens.  In this case I can't imagine there
>> being a performance issue with having a few tens of "hasTrackingId"
>> triples per dataset.  Do you agree?
>
> Not really... storing a few tens more triples per dataset could double
> the size of the triple store. I guess we would need to try. If all the
> tracking ids are based on some format convention, maybe we can assign
> a common tracking id to the dataset...
> L
>
>>
>> S.
>>
>>
>> ---
>> Stephen Pascoe  +44 (0)1235 445980
>> British Atmospheric Data Centre
>> Rutherford Appleton Laboratory
>>
>> -----Original Message-----
>> From: Luca Cinquini [mailto:luca at ucar.edu]
>> Sent: 03 December 2009 15:29
>> To: Pascoe, Stephen (STFC,RAL,SSTD)
>> Cc: Lawrence, Bryan (STFC,RAL,SSTD); go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] replicas and search
>>
>> Hi Stephen,
>> 	I guess the fundamental question is wether the trackingId is file-
>> specific or dataset-specific. It is certainly possible to have a new
>> RDF triple of the form:
>> (dataset, hasTrackingId, trackingId)
>> but we want to avoid having to start tracking ids for each file of
>> each dataset, since datasets can potentially have many, many files
>> (I assume this is still true?).
>> See below for your other related questions.
>> thanks, Luca
>>
>> On Dec 3, 2009, at 4:03 AM, <stephen.pascoe at stfc.ac.uk> <stephen.pascoe at stfc.ac.uk
>>> wrote:
>>
>>> Hi Luca,
>>>
>>> I've taken a look at the ontology in Protégé.  I see there is an
>>> esg:Dataset class with some interesting properties:
>>>
>>> esg:hasUri
>> points to the full dataset URI, for example:
>> <esg:hasUri rdf:datatype="http://www.w3.org/2001/
>> XMLSchema#string">resource://ESG-NCAR/ID#test.dataset.0</esg:hasUri>
>>
>>> esg:hasUUID
>> points to the dataset UUID, which is the database identifier, but we
>> don't currently use it within an RDF context.
>>> esg:hasDataArchive
>> this property is actually obsolete, replaced by hasGateway
>>> esg:hasDataService
>> used to point to data access services of any kind, through the
>> DatasetAccess object. Currently the ESG gateway does not store the
>> URL of a THREDDS catalog, but it looks like we will probably have to
>> do it, and if so we would use this property to store it in the RDF
>> triple store.
>>>
>>> How are these used in ESG?  Do any of them map onto THREDDS URLs of
>>> individual datasets?
>>> Maybe we could have a property esg:containsFileWithTrackingId?
>>>
>>> Cheers,
>>> Stephen.
>>>
>>>
>>> ---
>>> Stephen Pascoe  +44 (0)1235 445980
>>> British Atmospheric Data Centre
>>> Rutherford Appleton Laboratory
>>>
>>> -----Original Message-----
>>> From: go-essp-tech-bounces at ucar.edu
>>> [mailto:go-essp-tech-bounces at ucar.edu
>>> ] On Behalf Of Luca Cinquini
>>> Sent: 03 December 2009 00:59
>>> To: Lawrence, Bryan (STFC,RAL,SSTD)
>>> Cc: go-essp-tech at ucar.edu
>>> Subject: Re: [Go-essp-tech] replicas and search
>>>
>>> Hi Bryan:
>>>
>>> On Dec 2, 2009, at 10:20 AM, Bryan Lawrence wrote:
>>>
>>>> On Wednesday 02 December 2009 16:47:02 Don Middleton wrote:
>>>>> A follow-on to the call, as I had to leave a bit early. We were
>>>>> discussing being able to use the tracking id for various files. If
>>>>> the tracking id is not in the federated metadata, how does one  
>>>>> know
>>>>> which gateway to consult for information?
>>>>
>>>> I was wondering the same thing ...
>>>>
>>>> I think I've asked this question before, but I can't remember the
>>>> answer :-)
>>>>
>>>> Is the difference between what is in the Thredds catalog and what  
>>>> is
>>>> in the rdf representation of it documented anywhere?  (Or, where is
>>>> the schema for the rdf?)
>>>
>>> The schema for the RDF is the owl file itself:
>>>
>>> http://ontologies.ucar.edu/owl/esg.owl
>>>
>>> or if you want to look at all the ontologies in the package:
>>>
>>> http://ontologies.ucar.edu/owl/esg_all.owl
>>>
>>> Luca
>>>
>>>>
>>>> Cheers
>>>> Bryan
>>>>
>>>>>
>>>>> don
>>>>>
>>>>>
>>>>> On Dec 1, 2009, at 2:30 PM, Luca Cinquini wrote:
>>>>>
>>>>>> Hi,
>>>>>> 	as I mentioned today at the telecon, I think the RDF query
>>>>>> services should be able to handle the concept of dataset replicas
>>>>>> quite nicely. The main concepts are that the RDF records  
>>>>>> generated
>>>>>> from replicas must have different RDF identifiers, so they can be
>>>>>> exchanged independently among gateways, and must reference their
>>>>>> original RDF record.
>>>>>>
>>>>>> If you are interested, I've documented some of the details here:
>>>>>>
>>>>>> https://wiki.ucar.edu/display/esgcet/Metadata+Search+and+Replicas
>>>>>>
>>>>>> I'm also including a snapshot of an ESG Gateway that shows
>>>>>> multiple
>>>>>> replicas returned as part of a DRS search (note that how the
>>>>>> replicas are presented in the result page can be easily changed -
>>>>>> the picture is just to show that the query can handle replicas).
>>>>>>
>>>>>> thanks, Luca
>>>>>>
>>>>>> <DRS with Replicas.tiff>
>>>>>> _______________________________________________
>>>>>> GO-ESSP-TECH mailing list
>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bryan Lawrence
>>>> Director of Environmental Archival and Associated Research
>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC,
>>>> Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ... 5848;
>>>> Web: home.badc.rl.ac.uk/lawrence
>>>
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>> --
>>> Scanned by iCritical.
>>
>> --
>> Scanned by iCritical.
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

Gary Strand
strandwg at ucar.edu





More information about the GO-ESSP-TECH mailing list