[Go-essp-tech] versions, checksums and the TDS

Estanislao Gonzalez gonzalez at dkrz.de
Sun Sep 25 12:25:16 MDT 2011


I meant indeed the data providers, AFAIK some typical post-processing 
corrections are not generating new tracking_ids. But I might be wrong.

I think the best place for this to be checked would be in the publisher 
itself. I assume the tracking id is a column attribute in a DB table. If 
that's the case it might have already a unique constraint or it could be 
added easily, but that is something Bob certainly knows better.

Thanks,
Estani
Am 25.09.2011 19:25, schrieb Karl Taylor:
> Hi Estani,
>
> I'm not advocating using the tracking_id to test whether two files are 
> identical.  I'm suggesting that for most users, they will be able to 
> use it to determine whether they have the latest version of a 
> particular file, as opposed to some earlier version.  It's true that 
> you can modify a file without changing the tracking_id, but I'm pretty 
> sure all but a tiny number of users will download the files and never 
> modify them.  Whether or not a user alters files, new files available 
> from the CMIP5 archive will have tracking_ids that the user doesn't 
> have locally, so if they are interested, they can download the new files.
>
> The above assumes that data *providers* take care to generate a new 
> tracking_id when they generate a file containing new data.  Is this a 
> risky assumption?  Couldn't the CMIP5 QA procedure check whether a 
> file has the same tracking_id as any other file in the system?
>
> best regards,
> Karl
>
> On 9/25/11 2:49 AM, Estanislao Gonzalez wrote:
>> I recall a problem that when altering the file with some tools (cdo?) 
>> the tracking id wasn't automatically changed.
>> Are we sure that the same tracking id point to the same file now?
>> Is the previous not a problem anymore?
>>
>> Thanks,
>> Estani
>> Am 24.09.2011 18:17, schrieb Karl Taylor:
>>> Dear all,
>>>
>>> Concerning:
>>>
>>> On 9/24/11 4:35 AM, Estanislao Gonzalez wrote:
>>>> 2) checksums
>>>> They are the only reference to the outside that a data node give of the
>>>> changes a file suffered from one version to another, i.e. for
>>>> replication we use that information to retrieve only files that change
>>>> from one version to another. The same principle could be applied for
>>>> tools designed for end users.
>>>
>>> without disputing that checksums should be mandatory, I want to 
>>> point out that a user who has lost the checksum associated with a 
>>> file he has downloaded shouldn't have to recompute the checksum to 
>>> determine whether his file is a copy of a file residing at the 
>>> datanode.  Recall that recorded in each netCDF file is a unique 
>>> tracking_id, which I'm almost positive is also in the thredds 
>>> catalog.  It will certainly be quicker for the user to read the 
>>> tracking_id and then check whether it matches the latest version.  I 
>>> think we want to maintain tracking_id as an option for checking 
>>> whether new files exist in a new version.
>>>
>>> best regards,
>>> Karl
>>
>>
>> -- 
>> Estanislao Gonzalez
>>
>> Max-Planck-Institut für Meteorologie (MPI-M)
>> Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>> Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>
>> Phone:   +49 (40) 46 00 94-126
>> E-Mail:gonzalez at dkrz.de  


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110925/1cb31d9a/attachment.html 


More information about the GO-ESSP-TECH mailing list