[Go-essp-tech] replicas and search

Dean N. Williams williams13 at llnl.gov
Thu Dec 3 09:09:51 MST 2009


Yes it is by design.

The idea behind this is that if you have to rewrite the file, then  
it's a new version so a new tracking id is needed.

-Dean

On Dec 3, 2009, at 7:49 AM, Gary Strand wrote:

>
> I've been playing around with CMOR2 and it looks to me like the
> tracking ID assigned to a given netCDF file changes with every write
> of the file, even if the underlying data remains unchanged.
>
> Two writes by CMOR of the same test data, doing an 'ncdump' on both
> and then diffing the ncdumps shows:
>
> 64c64
> <               cl:history = "2009-12-02T23:05:40Z altered by CMOR:
> replaced missing value flag (1e+28) with standard missing value (1e
> +20)." ;
> ---
>>              cl:history = "2009-12-02T23:06:29Z altered by CMOR:
> replaced missing value flag (1e+28) with standard missing value (1e
> +20)." ;
> 77c77
> <               :history = "Output from archive/
> giccm_03_std_2xCO2_2256. 2009-12-02T23:05:40Z CMOR rewrote data to
> comply with CF standards and IPCC Fourth Assessment requirements." ;
> ---
>>              :history = "Output from archive/
> giccm_03_std_2xCO2_2256. 2009-12-02T23:06:29Z CMOR rewrote data to
> comply with CF standards and IPCC Fourth Assessment requirements." ;
> 83c83
> <               :creation_date = "2009-12-02T23:05:40Z" ;
> ---
>>              :creation_date = "2009-12-02T23:06:29Z" ;
> 91c91
> <               :tracking_id = "37228c28-6f2d-427a-a9c4- 
> e46545c2873e" ;
> ---
>>              :tracking_id = "eb942273-e0c6-42b8-b276-e3cc57ed0c0a" ;
>
> On Thu Dec 03, 2009, at 8:45 AM, Luca Cinquini wrote:
>
>>
>> On Dec 3, 2009, at 8:42 AM, <stephen.pascoe at stfc.ac.uk> wrote:
>>
>>>
>>> My understanding is that tracking-id is generated by CMOR for each
>>> file.  However, I still think it is useful to be able to answer the
>>> question "Which dataset does this file belong to" even if it's been
>>> separated from the rest of the dataset.  People will move these
>>> files about on their computers, ftp them about, email them to each
>>> other, etc.  We can't stop them doing that.
>>>
>>> If by dataset we mean atomic datasets, rather than any point in a
>>> "dataset hierarchy" then there will be relatively few files per
>>> dataset -- maybe a few tens.  In this case I can't imagine there
>>> being a performance issue with having a few tens of "hasTrackingId"
>>> triples per dataset.  Do you agree?
>>
>> Not really... storing a few tens more triples per dataset could  
>> double
>> the size of the triple store. I guess we would need to try. If all  
>> the
>> tracking ids are based on some format convention, maybe we can assign
>> a common tracking id to the dataset...
>> L
>>
>>>
>>> S.
>>>
>>>
>>> ---
>>> Stephen Pascoe  +44 (0)1235 445980
>>> British Atmospheric Data Centre
>>> Rutherford Appleton Laboratory
>>>
>>> -----Original Message-----
>>> From: Luca Cinquini [mailto:luca at ucar.edu]
>>> Sent: 03 December 2009 15:29
>>> To: Pascoe, Stephen (STFC,RAL,SSTD)
>>> Cc: Lawrence, Bryan (STFC,RAL,SSTD); go-essp-tech at ucar.edu
>>> Subject: Re: [Go-essp-tech] replicas and search
>>>
>>> Hi Stephen,
>>> 	I guess the fundamental question is wether the trackingId is file-
>>> specific or dataset-specific. It is certainly possible to have a new
>>> RDF triple of the form:
>>> (dataset, hasTrackingId, trackingId)
>>> but we want to avoid having to start tracking ids for each file of
>>> each dataset, since datasets can potentially have many, many files
>>> (I assume this is still true?).
>>> See below for your other related questions.
>>> thanks, Luca
>>>
>>> On Dec 3, 2009, at 4:03 AM, <stephen.pascoe at stfc.ac.uk> <stephen.pascoe at stfc.ac.uk
>>>> wrote:
>>>
>>>> Hi Luca,
>>>>
>>>> I've taken a look at the ontology in Protégé.  I see there is an
>>>> esg:Dataset class with some interesting properties:
>>>>
>>>> esg:hasUri
>>> points to the full dataset URI, for example:
>>> <esg:hasUri rdf:datatype="http://*www.*w3.org/2001/
>>> XMLSchema#string">resource://ESG-NCAR/ID#test.dataset.0</esg:hasUri>
>>>
>>>> esg:hasUUID
>>> points to the dataset UUID, which is the database identifier, but we
>>> don't currently use it within an RDF context.
>>>> esg:hasDataArchive
>>> this property is actually obsolete, replaced by hasGateway
>>>> esg:hasDataService
>>> used to point to data access services of any kind, through the
>>> DatasetAccess object. Currently the ESG gateway does not store the
>>> URL of a THREDDS catalog, but it looks like we will probably have to
>>> do it, and if so we would use this property to store it in the RDF
>>> triple store.
>>>>
>>>> How are these used in ESG?  Do any of them map onto THREDDS URLs of
>>>> individual datasets?
>>>> Maybe we could have a property esg:containsFileWithTrackingId?
>>>>
>>>> Cheers,
>>>> Stephen.
>>>>
>>>>
>>>> ---
>>>> Stephen Pascoe  +44 (0)1235 445980
>>>> British Atmospheric Data Centre
>>>> Rutherford Appleton Laboratory
>>>>
>>>> -----Original Message-----
>>>> From: go-essp-tech-bounces at ucar.edu
>>>> [mailto:go-essp-tech-bounces at ucar.edu
>>>> ] On Behalf Of Luca Cinquini
>>>> Sent: 03 December 2009 00:59
>>>> To: Lawrence, Bryan (STFC,RAL,SSTD)
>>>> Cc: go-essp-tech at ucar.edu
>>>> Subject: Re: [Go-essp-tech] replicas and search
>>>>
>>>> Hi Bryan:
>>>>
>>>> On Dec 2, 2009, at 10:20 AM, Bryan Lawrence wrote:
>>>>
>>>>> On Wednesday 02 December 2009 16:47:02 Don Middleton wrote:
>>>>>> A follow-on to the call, as I had to leave a bit early. We were
>>>>>> discussing being able to use the tracking id for various files.  
>>>>>> If
>>>>>> the tracking id is not in the federated metadata, how does one
>>>>>> know
>>>>>> which gateway to consult for information?
>>>>>
>>>>> I was wondering the same thing ...
>>>>>
>>>>> I think I've asked this question before, but I can't remember the
>>>>> answer :-)
>>>>>
>>>>> Is the difference between what is in the Thredds catalog and what
>>>>> is
>>>>> in the rdf representation of it documented anywhere?  (Or, where  
>>>>> is
>>>>> the schema for the rdf?)
>>>>
>>>> The schema for the RDF is the owl file itself:
>>>>
>>>> http://*ontologies.ucar.edu/owl/esg.owl
>>>>
>>>> or if you want to look at all the ontologies in the package:
>>>>
>>>> http://*ontologies.ucar.edu/owl/esg_all.owl
>>>>
>>>> Luca
>>>>
>>>>>
>>>>> Cheers
>>>>> Bryan
>>>>>
>>>>>>
>>>>>> don
>>>>>>
>>>>>>
>>>>>> On Dec 1, 2009, at 2:30 PM, Luca Cinquini wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> 	as I mentioned today at the telecon, I think the RDF query
>>>>>>> services should be able to handle the concept of dataset  
>>>>>>> replicas
>>>>>>> quite nicely. The main concepts are that the RDF records
>>>>>>> generated
>>>>>>> from replicas must have different RDF identifiers, so they can  
>>>>>>> be
>>>>>>> exchanged independently among gateways, and must reference their
>>>>>>> original RDF record.
>>>>>>>
>>>>>>> If you are interested, I've documented some of the details here:
>>>>>>>
>>>>>>> https://*wiki.ucar.edu/display/esgcet/Metadata+Search+and+Replicas
>>>>>>>
>>>>>>> I'm also including a snapshot of an ESG Gateway that shows
>>>>>>> multiple
>>>>>>> replicas returned as part of a DRS search (note that how the
>>>>>>> replicas are presented in the result page can be easily  
>>>>>>> changed -
>>>>>>> the picture is just to show that the query can handle replicas).
>>>>>>>
>>>>>>> thanks, Luca
>>>>>>>
>>>>>>> <DRS with Replicas.tiff>
>>>>>>> _______________________________________________
>>>>>>> GO-ESSP-TECH mailing list
>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>
>>>>>> _______________________________________________
>>>>>> GO-ESSP-TECH mailing list
>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bryan Lawrence
>>>>> Director of Environmental Archival and Associated Research
>>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC,
>>>>> Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ...  
>>>>> 5848;
>>>>> Web: home.badc.rl.ac.uk/lawrence
>>>>
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>> --
>>>> Scanned by iCritical.
>>>
>>> --
>>> Scanned by iCritical.
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
> Gary Strand
> strandwg at ucar.edu
>
>
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2919 bytes
Desc: not available
Url : http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20091203/228a095b/attachment.bin 


More information about the GO-ESSP-TECH mailing list