[Go-essp-tech] replicas and search --> checksums

martin.juckes at stfc.ac.uk martin.juckes at stfc.ac.uk
Mon Dec 7 04:51:47 MST 2009


Hello,

Can checksums be stored alongside the tracking ids? E.g. using ov:hasChecksum?

Cheers,
Martin

> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> bounces at ucar.edu] On Behalf Of Dean N. Williams
> Sent: 03 December 2009 16:10
> To: Gary Strand
> Cc: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] replicas and search
> 
> Yes it is by design.
> 
> The idea behind this is that if you have to rewrite the file, then it's
> a new version so a new tracking id is needed.
> 
> -Dean
> 
> On Dec 3, 2009, at 7:49 AM, Gary Strand wrote:
> 
> >
> > I've been playing around with CMOR2 and it looks to me like the
> > tracking ID assigned to a given netCDF file changes with every write
> > of the file, even if the underlying data remains unchanged.
> >
> > Two writes by CMOR of the same test data, doing an 'ncdump' on both
> > and then diffing the ncdumps shows:
> >
> > 64c64
> > <               cl:history = "2009-12-02T23:05:40Z altered by CMOR:
> > replaced missing value flag (1e+28) with standard missing value (1e
> > +20)." ;
> > ---
> >>              cl:history = "2009-12-02T23:06:29Z altered by CMOR:
> > replaced missing value flag (1e+28) with standard missing value (1e
> > +20)." ;
> > 77c77
> > <               :history = "Output from archive/
> > giccm_03_std_2xCO2_2256. 2009-12-02T23:05:40Z CMOR rewrote data to
> > comply with CF standards and IPCC Fourth Assessment requirements." ;
> > ---
> >>              :history = "Output from archive/
> > giccm_03_std_2xCO2_2256. 2009-12-02T23:06:29Z CMOR rewrote data to
> > comply with CF standards and IPCC Fourth Assessment requirements." ;
> > 83c83
> > <               :creation_date = "2009-12-02T23:05:40Z" ;
> > ---
> >>              :creation_date = "2009-12-02T23:06:29Z" ;
> > 91c91
> > <               :tracking_id = "37228c28-6f2d-427a-a9c4-
> > e46545c2873e" ;
> > ---
> >>              :tracking_id = "eb942273-e0c6-42b8-b276-e3cc57ed0c0a" ;
> >
> > On Thu Dec 03, 2009, at 8:45 AM, Luca Cinquini wrote:
> >
> >>
> >> On Dec 3, 2009, at 8:42 AM, <stephen.pascoe at stfc.ac.uk> wrote:
> >>
> >>>
> >>> My understanding is that tracking-id is generated by CMOR for each
> >>> file.  However, I still think it is useful to be able to answer the
> >>> question "Which dataset does this file belong to" even if it's been
> >>> separated from the rest of the dataset.  People will move these
> >>> files about on their computers, ftp them about, email them to each
> >>> other, etc.  We can't stop them doing that.
> >>>
> >>> If by dataset we mean atomic datasets, rather than any point in a
> >>> "dataset hierarchy" then there will be relatively few files per
> >>> dataset -- maybe a few tens.  In this case I can't imagine there
> >>> being a performance issue with having a few tens of "hasTrackingId"
> >>> triples per dataset.  Do you agree?
> >>
> >> Not really... storing a few tens more triples per dataset could
> >> double the size of the triple store. I guess we would need to try.
> If
> >> all the tracking ids are based on some format convention, maybe we
> >> can assign a common tracking id to the dataset...
> >> L
> >>
> >>>
> >>> S.
> >>>
> >>>
> >>> ---
> >>> Stephen Pascoe  +44 (0)1235 445980
> >>> British Atmospheric Data Centre
> >>> Rutherford Appleton Laboratory
> >>>
> >>> -----Original Message-----
> >>> From: Luca Cinquini [mailto:luca at ucar.edu]
> >>> Sent: 03 December 2009 15:29
> >>> To: Pascoe, Stephen (STFC,RAL,SSTD)
> >>> Cc: Lawrence, Bryan (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> >>> Subject: Re: [Go-essp-tech] replicas and search
> >>>
> >>> Hi Stephen,
> >>> 	I guess the fundamental question is wether the trackingId is
> file-
> >>> specific or dataset-specific. It is certainly possible to have a
> new
> >>> RDF triple of the form:
> >>> (dataset, hasTrackingId, trackingId) but we want to avoid having to
> >>> start tracking ids for each file of each dataset, since datasets
> can
> >>> potentially have many, many files (I assume this is still true?).
> >>> See below for your other related questions.
> >>> thanks, Luca
> >>>
> >>> On Dec 3, 2009, at 4:03 AM, <stephen.pascoe at stfc.ac.uk>
> >>> <stephen.pascoe at stfc.ac.uk
> >>>> wrote:
> >>>
> >>>> Hi Luca,
> >>>>
> >>>> I've taken a look at the ontology in Protégé.  I see there is an
> >>>> esg:Dataset class with some interesting properties:
> >>>>
> >>>> esg:hasUri
> >>> points to the full dataset URI, for example:
> >>> <esg:hasUri rdf:datatype="http://*www.*w3.org/2001/
> >>> XMLSchema#string">resource://ESG-
> NCAR/ID#test.dataset.0</esg:hasUri>
> >>>
> >>>> esg:hasUUID
> >>> points to the dataset UUID, which is the database identifier, but
> we
> >>> don't currently use it within an RDF context.
> >>>> esg:hasDataArchive
> >>> this property is actually obsolete, replaced by hasGateway
> >>>> esg:hasDataService
> >>> used to point to data access services of any kind, through the
> >>> DatasetAccess object. Currently the ESG gateway does not store the
> >>> URL of a THREDDS catalog, but it looks like we will probably have
> to
> >>> do it, and if so we would use this property to store it in the RDF
> >>> triple store.
> >>>>
> >>>> How are these used in ESG?  Do any of them map onto THREDDS URLs
> of
> >>>> individual datasets?
> >>>> Maybe we could have a property esg:containsFileWithTrackingId?
> >>>>
> >>>> Cheers,
> >>>> Stephen.
> >>>>
> >>>>
> >>>> ---
> >>>> Stephen Pascoe  +44 (0)1235 445980
> >>>> British Atmospheric Data Centre
> >>>> Rutherford Appleton Laboratory
> >>>>
> >>>> -----Original Message-----
> >>>> From: go-essp-tech-bounces at ucar.edu
> >>>> [mailto:go-essp-tech-bounces at ucar.edu
> >>>> ] On Behalf Of Luca Cinquini
> >>>> Sent: 03 December 2009 00:59
> >>>> To: Lawrence, Bryan (STFC,RAL,SSTD)
> >>>> Cc: go-essp-tech at ucar.edu
> >>>> Subject: Re: [Go-essp-tech] replicas and search
> >>>>
> >>>> Hi Bryan:
> >>>>
> >>>> On Dec 2, 2009, at 10:20 AM, Bryan Lawrence wrote:
> >>>>
> >>>>> On Wednesday 02 December 2009 16:47:02 Don Middleton wrote:
> >>>>>> A follow-on to the call, as I had to leave a bit early. We were
> >>>>>> discussing being able to use the tracking id for various files.
> >>>>>> If
> >>>>>> the tracking id is not in the federated metadata, how does one
> >>>>>> know which gateway to consult for information?
> >>>>>
> >>>>> I was wondering the same thing ...
> >>>>>
> >>>>> I think I've asked this question before, but I can't remember the
> >>>>> answer :-)
> >>>>>
> >>>>> Is the difference between what is in the Thredds catalog and what
> >>>>> is in the rdf representation of it documented anywhere?  (Or,
> >>>>> where is the schema for the rdf?)
> >>>>
> >>>> The schema for the RDF is the owl file itself:
> >>>>
> >>>> http://*ontologies.ucar.edu/owl/esg.owl
> >>>>
> >>>> or if you want to look at all the ontologies in the package:
> >>>>
> >>>> http://*ontologies.ucar.edu/owl/esg_all.owl
> >>>>
> >>>> Luca
> >>>>
> >>>>>
> >>>>> Cheers
> >>>>> Bryan
> >>>>>
> >>>>>>
> >>>>>> don
> >>>>>>
> >>>>>>
> >>>>>> On Dec 1, 2009, at 2:30 PM, Luca Cinquini wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>> 	as I mentioned today at the telecon, I think the RDF query
> >>>>>>> services should be able to handle the concept of dataset
> >>>>>>> replicas quite nicely. The main concepts are that the RDF
> >>>>>>> records generated from replicas must have different RDF
> >>>>>>> identifiers, so they can be exchanged independently among
> >>>>>>> gateways, and must reference their original RDF record.
> >>>>>>>
> >>>>>>> If you are interested, I've documented some of the details
> here:
> >>>>>>>
> >>>>>>>
> https://*wiki.ucar.edu/display/esgcet/Metadata+Search+and+Replic
> >>>>>>> as
> >>>>>>>
> >>>>>>> I'm also including a snapshot of an ESG Gateway that shows
> >>>>>>> multiple replicas returned as part of a DRS search (note that
> >>>>>>> how the replicas are presented in the result page can be easily
> >>>>>>> changed - the picture is just to show that the query can handle
> >>>>>>> replicas).
> >>>>>>>
> >>>>>>> thanks, Luca
> >>>>>>>
> >>>>>>> <DRS with Replicas.tiff>
> >>>>>>> _______________________________________________
> >>>>>>> GO-ESSP-TECH mailing list
> >>>>>>> GO-ESSP-TECH at ucar.edu
> >>>>>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> GO-ESSP-TECH mailing list
> >>>>>> GO-ESSP-TECH at ucar.edu
> >>>>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Bryan Lawrence
> >>>>> Director of Environmental Archival and Associated Research
> >>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC,
> >>>>> Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ...
> >>>>> 5848;
> >>>>> Web: home.badc.rl.ac.uk/lawrence
> >>>>
> >>>> _______________________________________________
> >>>> GO-ESSP-TECH mailing list
> >>>> GO-ESSP-TECH at ucar.edu
> >>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>>> --
> >>>> Scanned by iCritical.
> >>>
> >>> --
> >>> Scanned by iCritical.
> >>
> >> _______________________________________________
> >> GO-ESSP-TECH mailing list
> >> GO-ESSP-TECH at ucar.edu
> >> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >
> > Gary Strand
> > strandwg at ucar.edu
> >
> >
> >
> > _______________________________________________
> > GO-ESSP-TECH mailing list
> > GO-ESSP-TECH at ucar.edu
> > http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >

-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list