[Go-essp-tech] versions, checksums and the TDS

Karl Taylor taylor13 at llnl.gov
Sun Sep 25 11:25:39 MDT 2011


Hi Estani,

I'm not advocating using the tracking_id to test whether two files are 
identical.  I'm suggesting that for most users, they will be able to use 
it to determine whether they have the latest version of a particular 
file, as opposed to some earlier version.  It's true that you can modify 
a file without changing the tracking_id, but I'm pretty sure all but a 
tiny number of users will download the files and never modify them.  
Whether or not a user alters files, new files available from the CMIP5 
archive will have tracking_ids that the user doesn't have locally, so if 
they are interested, they can download the new files.

The above assumes that data *providers* take care to generate a new 
tracking_id when they generate a file containing new data.  Is this a 
risky assumption?  Couldn't the CMIP5 QA procedure check whether a file 
has the same tracking_id as any other file in the system?

best regards,
Karl

On 9/25/11 2:49 AM, Estanislao Gonzalez wrote:
> I recall a problem that when altering the file with some tools (cdo?) 
> the tracking id wasn't automatically changed.
> Are we sure that the same tracking id point to the same file now?
> Is the previous not a problem anymore?
>
> Thanks,
> Estani
> Am 24.09.2011 18:17, schrieb Karl Taylor:
>> Dear all,
>>
>> Concerning:
>>
>> On 9/24/11 4:35 AM, Estanislao Gonzalez wrote:
>>> 2) checksums
>>> They are the only reference to the outside that a data node give of the
>>> changes a file suffered from one version to another, i.e. for
>>> replication we use that information to retrieve only files that change
>>> from one version to another. The same principle could be applied for
>>> tools designed for end users.
>>
>> without disputing that checksums should be mandatory, I want to point 
>> out that a user who has lost the checksum associated with a file he 
>> has downloaded shouldn't have to recompute the checksum to determine 
>> whether his file is a copy of a file residing at the datanode.  
>> Recall that recorded in each netCDF file is a unique tracking_id, 
>> which I'm almost positive is also in the thredds catalog.  It will 
>> certainly be quicker for the user to read the tracking_id and then 
>> check whether it matches the latest version.  I think we want to 
>> maintain tracking_id as an option for checking whether new files 
>> exist in a new version.
>>
>> best regards,
>> Karl
>
>
> -- 
> Estanislao Gonzalez
>
> Max-Planck-Institut für Meteorologie (MPI-M)
> Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
> Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>
> Phone:   +49 (40) 46 00 94-126
> E-Mail:gonzalez at dkrz.de  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110925/fc79a18d/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list