<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.19120"></HEAD>
<BODY bgColor=#ffffff text=#000000>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial>Hello,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial>sorry, I haven't had time to digest everything.
</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial>In response to the tracking id issue: I think there is a
significant chance that a data provider might accidentally provide
different files with the same tracking id. The most likely case is around
correction of things like 'forcings' and 'branch_date' - those NetCDF
attributes that are left to the data provider to manage. I think its easy
to make a slip with these (anecdotally I've hear there are already examples
in the CMIP5 repository), then correct later (not sure whether there are plans
to correct the examples already seen). Correction with a simple ncatted
will not update the tracking id.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial>I don't know if this case of different files, same
tracking id, has happened already - I guess someone could find out by
trawling the catalogues...</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial>Rather than worry about whether tracking id is reliable I
think its better to invest effort in getting the checksum in the
system for all data. *But* I don't control any effort on this, so weigh my
opinions with that in mind...</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial>(All this was in quite a rush - hope I haven't said anything
too stupid/more stupid than usual)</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff
size=2 face=Arial>Jamie</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px"
dir=ltr>
<DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>From:</B> go-essp-tech-bounces@ucar.edu
[mailto:go-essp-tech-bounces@ucar.edu] <B>On Behalf Of </B>Estanislao
Gonzalez<BR><B>Sent:</B> 25 September 2011 19:25<BR><B>To:</B> Karl
Taylor<BR><B>Cc:</B> go-essp-tech@ucar.edu<BR><B>Subject:</B> Re:
[Go-essp-tech] versions, checksums and the TDS<BR></FONT><BR></DIV>
<DIV></DIV>I meant indeed the data providers, AFAIK some typical
post-processing corrections are not generating new tracking_ids. But I might
be wrong.<BR><BR>I think the best place for this to be checked would be in the
publisher itself. I assume the tracking id is a column attribute in a DB
table. If that's the case it might have already a unique constraint or it
could be added easily, but that is something Bob certainly knows
better.<BR><BR>Thanks,<BR>Estani<BR>Am 25.09.2011 19:25, schrieb Karl Taylor:
<BLOCKQUOTE cite=mid:4E7F6413.1010203@llnl.gov type="cite"><FONT
face="Times New Roman">Hi Estani,<BR><BR>I'm not advocating using the
tracking_id to test whether two files are identical. I'm suggesting
that for most users, they will be able to use it to determine whether they
have the latest version of a particular file, as opposed to some earlier
version. It's true that you can modify a file without changing the
tracking_id, but I'm pretty sure all but a tiny number of users will
download the files and never modify them. Whether or not a user alters
files, new files available from the CMIP5 archive will have tracking_ids
that the user doesn't have locally, so if they are interested, they can
download the new files.<BR><BR>The above assumes that data *providers* take
care to generate a new tracking_id when they generate a file containing new
data. Is this a risky assumption? Couldn't the CMIP5 QA
procedure check whether a file has the same tracking_id as any other file in
the system?<BR><BR>best regards,<BR>Karl<BR><BR></FONT>On 9/25/11 2:49 AM,
Estanislao Gonzalez wrote:
<BLOCKQUOTE cite=mid:4E7EF93E.8000808@dkrz.de type="cite">I recall a
problem that when altering the file with some tools (cdo?) the tracking id
wasn't automatically changed.<BR>Are we sure that the same tracking id
point to the same file now? <BR>Is the previous not a problem
anymore?<BR><BR>Thanks,<BR>Estani<BR>Am 24.09.2011 18:17, schrieb Karl
Taylor:
<BLOCKQUOTE cite=mid:4E7E027D.50405@llnl.gov type="cite"><FONT
face="Times New Roman">Dear all,<BR><BR>Concerning:<BR></FONT><BR>On
9/24/11 4:35 AM, Estanislao Gonzalez wrote:
<BLOCKQUOTE cite=mid:4E7DC078.5000700@dkrz.de type="cite"><PRE wrap="">2) checksums
They are the only reference to the outside that a data node give of the
changes a file suffered from one version to another, i.e. for
replication we use that information to retrieve only files that change
from one version to another. The same principle could be applied for
tools designed for end users.
</PRE></BLOCKQUOTE><BR>without disputing that checksums should be
mandatory, I want to point out that a user who has lost the checksum
associated with a file he has downloaded shouldn't have to recompute the
checksum to determine whether his file is a copy of a file residing at
the datanode. Recall that recorded in each netCDF file is a unique
tracking_id, which I'm almost positive is also in the thredds
catalog. It will certainly be quicker for the user to read the
tracking_id and then check whether it matches the latest version.
I think we want to maintain tracking_id as an option for checking
whether new files exist in a new version.<BR><BR>best
regards,<BR>Karl<BR></BLOCKQUOTE><BR><BR><PRE class=moz-signature cols="72">--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: <A class=moz-txt-link-abbreviated href="mailto:gonzalez@dkrz.de" moz-do-not-send="true">gonzalez@dkrz.de</A> </PRE></BLOCKQUOTE></BLOCKQUOTE><BR><BR><PRE class=moz-signature cols="72">--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: <A class=moz-txt-link-abbreviated href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</A> </PRE></BLOCKQUOTE></BODY></HTML>