[Go-essp-tech] versions, checksums and the TDS
Karl Taylor
taylor13 at llnl.gov
Sun Sep 25 11:25:39 MDT 2011
Hi Estani,
I'm not advocating using the tracking_id to test whether two files are
identical. I'm suggesting that for most users, they will be able to use
it to determine whether they have the latest version of a particular
file, as opposed to some earlier version. It's true that you can modify
a file without changing the tracking_id, but I'm pretty sure all but a
tiny number of users will download the files and never modify them.
Whether or not a user alters files, new files available from the CMIP5
archive will have tracking_ids that the user doesn't have locally, so if
they are interested, they can download the new files.
The above assumes that data *providers* take care to generate a new
tracking_id when they generate a file containing new data. Is this a
risky assumption? Couldn't the CMIP5 QA procedure check whether a file
has the same tracking_id as any other file in the system?
best regards,
Karl
On 9/25/11 2:49 AM, Estanislao Gonzalez wrote:
> I recall a problem that when altering the file with some tools (cdo?)
> the tracking id wasn't automatically changed.
> Are we sure that the same tracking id point to the same file now?
> Is the previous not a problem anymore?
>
> Thanks,
> Estani
> Am 24.09.2011 18:17, schrieb Karl Taylor:
>> Dear all,
>>
>> Concerning:
>>
>> On 9/24/11 4:35 AM, Estanislao Gonzalez wrote:
>>> 2) checksums
>>> They are the only reference to the outside that a data node give of the
>>> changes a file suffered from one version to another, i.e. for
>>> replication we use that information to retrieve only files that change
>>> from one version to another. The same principle could be applied for
>>> tools designed for end users.
>>
>> without disputing that checksums should be mandatory, I want to point
>> out that a user who has lost the checksum associated with a file he
>> has downloaded shouldn't have to recompute the checksum to determine
>> whether his file is a copy of a file residing at the datanode.
>> Recall that recorded in each netCDF file is a unique tracking_id,
>> which I'm almost positive is also in the thredds catalog. It will
>> certainly be quicker for the user to read the tracking_id and then
>> check whether it matches the latest version. I think we want to
>> maintain tracking_id as an option for checking whether new files
>> exist in a new version.
>>
>> best regards,
>> Karl
>
>
> --
> Estanislao Gonzalez
>
> Max-Planck-Institut für Meteorologie (MPI-M)
> Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
> Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>
> Phone: +49 (40) 46 00 94-126
> E-Mail:gonzalez at dkrz.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110925/fc79a18d/attachment-0001.html
More information about the GO-ESSP-TECH
mailing list