<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>

<META name=GENERATOR content="MSHTML 8.00.6001.19120"></HEAD>

<BODY bgColor=#ffffff text=#000000>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial>Hello,</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial>sorry, I haven't had time to digest everything.&nbsp; 

</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial>In response to the tracking id&nbsp;issue: I think there is a 

significant&nbsp;chance that a data provider might accidentally provide 

different files with the same tracking id.&nbsp; The most likely case is around 

correction of things like 'forcings' and 'branch_date' -&nbsp;those&nbsp;NetCDF 

attributes that are left to the data provider to manage.&nbsp; I think its easy 

to make a slip with these (anecdotally I've hear there&nbsp;are already examples 

in the CMIP5 repository), then correct later (not sure whether there are plans 

to correct the examples already seen).&nbsp; Correction with a simple ncatted 

will not update the tracking id.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial>I don't know if this&nbsp;case of&nbsp;different files, same 

tracking id, has&nbsp;happened already - I guess&nbsp;someone could find out by 

trawling the catalogues...</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial>Rather than worry about whether tracking id is reliable&nbsp;I 

think&nbsp;its better&nbsp;to invest effort in getting the checksum in the 

system for all data.&nbsp; *But* I don't control any effort on this, so weigh my 

opinions with that in mind...</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial>(All this was in quite a rush - hope I haven't said anything 

too stupid/more stupid than usual)</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=606400916-26092011><FONT color=#0000ff 

size=2 face=Arial>Jamie</FONT></SPAN></DIV><BR>

<BLOCKQUOTE 

style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px" 

dir=ltr>

  <DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>

  <HR tabIndex=-1>

  <FONT size=2 face=Tahoma><B>From:</B> go-essp-tech-bounces@ucar.edu 

  [mailto:go-essp-tech-bounces@ucar.edu] <B>On Behalf Of </B>Estanislao 

  Gonzalez<BR><B>Sent:</B> 25 September 2011 19:25<BR><B>To:</B> Karl 

  Taylor<BR><B>Cc:</B> go-essp-tech@ucar.edu<BR><B>Subject:</B> Re: 

  [Go-essp-tech] versions, checksums and the TDS<BR></FONT><BR></DIV>

  <DIV></DIV>I meant indeed the data providers, AFAIK some typical 

  post-processing corrections are not generating new tracking_ids. But I might 

  be wrong.<BR><BR>I think the best place for this to be checked would be in the 

  publisher itself. I assume the tracking id is a column attribute in a DB 

  table. If that's the case it might have already a unique constraint or it 

  could be added easily, but that is something Bob certainly knows 

  better.<BR><BR>Thanks,<BR>Estani<BR>Am 25.09.2011 19:25, schrieb Karl Taylor: 

  <BLOCKQUOTE cite=mid:4E7F6413.1010203@llnl.gov type="cite"><FONT 

    face="Times New Roman">Hi Estani,<BR><BR>I'm not advocating using the 

    tracking_id to test whether two files are identical.&nbsp; I'm suggesting 

    that for most users, they will be able to use it to determine whether they 

    have the latest version of a particular file, as opposed to some earlier 

    version.&nbsp; It's true that you can modify a file without changing the 

    tracking_id, but I'm pretty sure all but a tiny number of users will 

    download the files and never modify them.&nbsp; Whether or not a user alters 

    files, new files available from the CMIP5 archive will have tracking_ids 

    that the user doesn't have locally, so if they are interested, they can 

    download the new files.<BR><BR>The above assumes that data *providers* take 

    care to generate a new tracking_id when they generate a file containing new 

    data.&nbsp; Is this a risky assumption?&nbsp; Couldn't the CMIP5 QA 

    procedure check whether a file has the same tracking_id as any other file in 

    the system?<BR><BR>best regards,<BR>Karl<BR><BR></FONT>On 9/25/11 2:49 AM, 

    Estanislao Gonzalez wrote: 

    <BLOCKQUOTE cite=mid:4E7EF93E.8000808@dkrz.de type="cite">I recall a 

      problem that when altering the file with some tools (cdo?) the tracking id 

      wasn't automatically changed.<BR>Are we sure that the same tracking id 

      point to the same file now? <BR>Is the previous not a problem 

      anymore?<BR><BR>Thanks,<BR>Estani<BR>Am 24.09.2011 18:17, schrieb Karl 

      Taylor: 

      <BLOCKQUOTE cite=mid:4E7E027D.50405@llnl.gov type="cite"><FONT 

        face="Times New Roman">Dear all,<BR><BR>Concerning:<BR></FONT><BR>On 

        9/24/11 4:35 AM, Estanislao Gonzalez wrote: 

        <BLOCKQUOTE cite=mid:4E7DC078.5000700@dkrz.de type="cite"><PRE wrap="">2) checksums

They are the only reference to the outside that a data node give of the 

changes a file suffered from one version to another, i.e. for 

replication we use that information to retrieve only files that change 

from one version to another. The same principle could be applied for 

tools designed for end users.

</PRE></BLOCKQUOTE><BR>without disputing that checksums should be 

        mandatory, I want to point out that a user who has lost the checksum 

        associated with a file he has downloaded shouldn't have to recompute the 

        checksum to determine whether his file is a copy of a file residing at 

        the datanode.&nbsp; Recall that recorded in each netCDF file is a unique 

        tracking_id, which I'm almost positive is also in the thredds 

        catalog.&nbsp; It will certainly be quicker for the user to read the 

        tracking_id and then check whether it matches the latest version.&nbsp; 

        I think we want to maintain tracking_id as an option for checking 

        whether new files exist in a new version.<BR><BR>best 

        regards,<BR>Karl<BR></BLOCKQUOTE><BR><BR><PRE class=moz-signature cols="72">-- 

Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126

E-Mail:  <A class=moz-txt-link-abbreviated href="mailto:gonzalez@dkrz.de" moz-do-not-send="true">gonzalez@dkrz.de</A> </PRE></BLOCKQUOTE></BLOCKQUOTE><BR><BR><PRE class=moz-signature cols="72">-- 

Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126

E-Mail:  <A class=moz-txt-link-abbreviated href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</A> </PRE></BLOCKQUOTE></BODY></HTML>