<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META content="text/html; charset=us-ascii" http-equiv=Content-Type>

<META name=GENERATOR content="MSHTML 8.00.6001.19120"></HEAD>

<BODY bgColor=#ffffff text=#000000>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>Hello Karl,</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>thanks for responding on this and making the user view much 

more explicit.&nbsp; And thanks for the note on the checksum - its good to know 

this is close to being 'required'.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>I also agree that the risk to CMIP5 (and the model 

contribution to IPCC&nbsp;AR5)&nbsp;due to data access problems is sufficiently 

high that simple (non-general) solutions that can be delivered quickly are 

needed.&nbsp; I think that many of the early CMIP5/working group 1 users are 

happy to take some of the responsibility for filtering which data they need, 

q.c.,&nbsp;etc on themselves.&nbsp; So this user base can 

toleterate&nbsp;simpler solutions.&nbsp; This may not apply to working groups 2, 

3... I don't know.&nbsp; Later studies of working group 1 may need richer model 

meta-data.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>Some questions /comments&nbsp;on your 

proposal:</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>1. I think the list files are derivable from the thredds 

catalogue entries for the&nbsp;publication version dataset (if they all 

contained &nbsp;the checksums) - I think you suspect this.&nbsp; In a sense (I 

think) they are a reformatting of the thredds catalogues into a form more 

parsable by users. If it can be achieved in time then I think its safer to get 

the checksums in the thredds cataloges and derive any other format from 

there.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>2. do you think you would expose these&nbsp;list files through 

http?&nbsp; You mention gridFTP but how soon do you think gridFTP will be 

available for the users that need it...</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>&nbsp;&nbsp; </FONT></SPAN><SPAN 

class=604260509-23092011><FONT color=#0000ff size=2 face=Arial>a. I'm not sure 

how many have data available through gridFTP (<A 

href="http://esgf.org/wiki/Cmip5Status/ArchiveView">http://esgf.org/wiki/Cmip5Status/ArchiveView</A>&nbsp;suggest 

not many?)</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>&nbsp;&nbsp;&nbsp;b. I'm not sure how many users will have 

gridFTP clients (or maybe you can use a standard ftp 

client?)</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>3. Do you need to capture this idea of 'latest' in the user 

view, or can the user work this out based on the version 

number?</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>&nbsp;&nbsp; a. including 'latest' makes is easier for users 

as it takes one bit of responsibility away from them</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>&nbsp;&nbsp; b. but you may be introducing an inconsistency 

between the thredds interface (which doesn't really expose this idea of latest), 

and the more file based interface).</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>&nbsp;&nbsp; c. this exposure of 'latest' may be a minor 

point, (but its ringing alarm&nbsp;bells with me).</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>4. I don't think you need time sample do you - isn't that in 

the file name?</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>5. what is the full path to the file - the one visible through 

gridFTP, or through the thredds file server or what?</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>6. an addition - but in the same vain of simplicity, can we 

have an easy to parse list of servers that hold CMIP5 data available via 

http.&nbsp; In the first instance this could be populated by hand.&nbsp; It 

could be as simple as a csv file - server,pki_status.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>I'm afraid I haven't had time to think about all the issues 

around hard links, soft links and tape storage - and there may be more major 

issues there.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=604260509-23092011><FONT color=#0000ff 

size=2 face=Arial>Jamie</FONT></SPAN></DIV><BR>

<BLOCKQUOTE 

style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px" 

dir=ltr>

  <DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>

  <HR tabIndex=-1>

  <FONT size=2 face=Tahoma><B>From:</B> Karl Taylor [mailto:taylor13@llnl.gov] 

  <BR><B>Sent:</B> 21 September 2011 23:19<BR><B>To:</B> 

  stephen.pascoe@stfc.ac.uk<BR><B>Cc:</B> gavin@llnl.gov; Kettleborough, Jamie; 

  go-essp-tech@ucar.edu; esg-node-dev@lists.llnl.gov<BR><B>Subject:</B> Re: 

  [Go-essp-tech] Reasoning for the use of symbolic links in 

  drslib<BR></FONT><BR></DIV>

  <DIV></DIV><FONT face="Times New Roman">Hi Stephen and all,<BR><BR>I would add 

  another requirement (or is this part of 4?):<BR><BR>5.&nbsp; A user (as 

  opposed to a data provider or a "replicator" or a data center data manager) 

  should be able to determine (through an automated scripted process) whether a 

  file previously downloaded is in the current (i.e., "latest")&nbsp; version of 

  a dataset, or has been withdrawn or replaced.<BR><BR>To meet all the 

  requirements in a practical way in the next few weeks, I'll suggest an 

  alternative approach:&nbsp; We could use drsLib to create the DRS directory 

  structure, but populate the lowest level (where the files would normally be 

  found) with a single text file&nbsp; (referred to subsequently as the "listing 

  file") containing the following information:<BR><BR>the publication-level 

  dataset version THREDDS id, which is: 

  &lt;activity&gt;.&lt;product&gt;.&lt;institute&gt;.&lt;model&gt;.&lt;experiment&gt;.&lt;frequency&gt;.&lt;modeling 

  realm&gt;.&lt;MIP table&gt;.&lt;ensemble member&gt;.&lt;version 

  number&gt;<BR>plus the &lt;variable name&gt;<BR>followed by a table 

  with:<BR>filename&nbsp;&nbsp;&nbsp; &nbsp; time units &nbsp; &nbsp; time of 

  1st time sample&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; time of 

  last time sample&nbsp;&nbsp;&nbsp; &nbsp;&nbsp; full path to 

  file&nbsp;&nbsp;&nbsp; </FONT><FONT 

  face="Times New Roman">tracking_id&nbsp;&nbsp;&nbsp; 

  checksum&nbsp;&nbsp;&nbsp;&nbsp; </FONT><BR><FONT 

  face="Times New Roman">----------&nbsp; &nbsp; &nbsp;&nbsp; ------------ 

  &nbsp; &nbsp; 

  ----------------------------&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  -----------------------------&nbsp;&nbsp; &nbsp; -------------------&nbsp; 

  &nbsp; -------------&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  ----------<BR>file1<BR>file2 <BR>. <BR>.<BR>.<BR>fileN<BR><BR>The "listing 

  file" would be stored twice for the latest version of each dataset:&nbsp; once 

  under the numbered version subdirectory and *also* under the generically 

  labeled "latest" directory.&nbsp; [This is so a user interested in the latest 

  version can find it without knowing its actual number.]&nbsp; By the way the 

  time information included in the list might not be absolutely essential, but 

  it could be helpful for those only wanting to download specific time-segments 

  of an integration.<BR><BR>I realize this is not a particularly elegant 

  approach, but if users were given access to the drs directory structure (say, 

  through gridftp), they could run a script that navigated directly to a 

  variable of interest (based on the DRS directory structure specifications) and 

  download the "listing file" stored there.&nbsp; Then, the "latest" listing 

  file could be compared to the older "listing file" (previously downloaded by 

  the user) to determine whether a new version was available (by simply 

  comparing the &lt;version numbers&gt; stored in the THREDDS ID).&nbsp; If the 

  user didn't have the most recent version, he could then compare the two 

  "listing files" (old and new) to determine which files were new and which (if 

  any) had been eliminated.<BR><BR>At that point, the user could generate a 

  local copy of the latest version by moving/deleting files not found in the 

  latest "listing file" and by downloading (using, for example, gridftp) only 

  the new files.<BR><BR>I bet that in a single day</FONT><FONT 

  face="Times New Roman"> Stephen could </FONT><FONT 

  face="Times New Roman">enhance drslib to produce these list files, rather than 

  creating the symbolic links to the actual file locations as it currently 

  does.&nbsp; Note that if the actual files were moved into new directories 

  sometime in the future, a utility would have to be written to modify all the 

  "list files" to point to the new file locations (but that's also true of the 

  symbolic links, I think)<BR><BR>Also note that creation of a new version would 

  *not* require changing any of the existing "list files" (except the list file 

  in the "latest" directory would be removed).&nbsp; A new version subdirectory 

  would have to be created and for each variable in the dataset, the new "list 

  file" for that version would have to be generated (and copied also to 

  "latest").<BR><BR>I'll be interested in your response to this idea and trust 

  that any time spent thinking about it is warranted (i.e., that this is not a 

  completely stupid suggestion).&nbsp; Will it meet all of Stephen's 

  needs?&nbsp; Are there any other solutions to the data users' troubles in 

  obtaining data, which we can implement in the next few weeks (since that 

  should be our goal here).<BR><BR>My primary interest is in making CMIP5 data 

  easily obtainable by users (which appears not to be the case at present), and 

  to allow users to write scripts to troll for new data they are interested in 

  and discover any new versions of data that should replace the old.&nbsp; This 

  is not meant to be a general solution to all of the possible ESG 

  applications.&nbsp; Also, I'm guessing that a similar approach could be 

  followed where instead of reading the "list files", one read the catalogs, but 

  I doubt that this would be as easy for the typical user to do.<BR><BR>Best 

  regards,<BR>Karl<BR><BR></FONT><FONT face="Times New Roman">P.S. to weigh in 

  on another issue, I think it *will* be essential to require, as part of ESG 

  publication that the check-sum be recorded (in the THREDDS catalog, 

  </FONT><FONT face="Times New Roman">if I'm not mistaken</FONT><FONT 

  face="Times New Roman">).&nbsp; We haven't asked groups to republish data 

  conforming to this new requirement because I want to make sure that any other 

  required alterations in the configuration of the publisher are also 

  communicated, so we only have to ask groups to republish once.</FONT>&nbsp; 

  Note also that if my "alternative" approach outlined above is adopted, the 

  checksums could either be gotten from the catalog (if they were computed and 

  stored there) or be calculated by drslib itself; there would be no need to 

  republish data to ESG<BR><FONT face="Times New Roman"><BR></FONT><BR>On 

  9/20/11 2:35 PM, <A class=moz-txt-link-abbreviated 

  href="mailto:stephen.pascoe@stfc.ac.uk">stephen.pascoe@stfc.ac.uk</A> wrote: 

  <BLOCKQUOTE 

  cite=mid:4C353E6E4A08AE4792B350DAA392B5210EC79DDE@EXCHMBX01.fed.cclrc.ac.uk 

  type="cite">

    <META name=Generator 

    content="Microsoft Word 12 (filtered&#13;&#10;        medium)">

    <STYLE>@font-face {

        font-family: Cambria Math;

}

@font-face {

        font-family: Calibri;

}

@font-face {

        font-family: Tahoma;

}

@font-face {

        font-family: Consolas;

}

@page WordSection1 {size: 612.0pt 792.0pt; margin: 72.0pt 72.0pt 72.0pt 72.0pt; }

P.MsoNormal {

        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman","serif"; COLOR: black; FONT-SIZE: 12pt

}

LI.MsoNormal {

        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman","serif"; COLOR: black; FONT-SIZE: 12pt

}

DIV.MsoNormal {

        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman","serif"; COLOR: black; FONT-SIZE: 12pt

}

A:link {

        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.MsoHyperlink {

        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99

}

A:visited {

        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.MsoHyperlinkFollowed {

        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99

}

PRE {

        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Courier New"; COLOR: black; FONT-SIZE: 10pt; mso-style-priority: 99; mso-style-link: "HTML Preformatted Char"

}

SPAN.HTMLPreformattedChar {

        FONT-FAMILY: Consolas; COLOR: black; mso-style-priority: 99; mso-style-link: "HTML Preformatted"; mso-style-name: "HTML Preformatted Char"

}

SPAN.EmailStyle19 {

        FONT-FAMILY: "Calibri","sans-serif"; COLOR: #1f497d; mso-style-type: personal-reply

}

.MsoChpDefault {

        FONT-SIZE: 10pt; mso-style-type: export-only

}

DIV.WordSection1 {

        page: WordSection1

}

</STYLE>

<!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

    <DIV class=WordSection1>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">Hi 

    All,<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">Lots 

    of good discussion here and sorry I've been keeping quiet.&nbsp; I want to 

    remind ourselves of the requirements I laid out in the wiki 

    page<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">1. 

    It should allow data from multiple versions to be kept on disk 

    simultaneously.<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">2. 

    It should avoid storing multiple copies of files that are present in more 

    than one version.<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">3. 

    It should be straightforward to copy dataset changes (i.e. differences 

    between versions) between nodes to allow efficient 

    replication.<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">4. 

    It should rely only on the filesystem so that generic tools like FTP could 

    be used to expose the structure if necessary.<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">In 

    my view we should address these directly.&nbsp; Are they needed?&nbsp; Which 

    are the most important?<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">Gavin 

    said about catalogs<O:P></O:P></SPAN></P>

    <P class=MsoNormal>&gt; you can quickly and easily inspect catalog_v1 and 

    catalog_v2 to find what the changes are.<BR>&gt; This all answers the 

    question of "WHAT" (to download)... the other question of "HOW" is a 

    different, but related story.<BR>&gt; The trick is to not conflate the two 

    issues which is what filesystem discussions do.&nbsp;.&nbsp;<SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">But 

    THREDDS conflates the two as well!&nbsp; A THREDDS catalog contains 

    descriptions of service endpoints that are not independent of the node 

    serving the data (the "HOW").&nbsp; Maybe we should have developed a true 

    catalog format but that is not where we are now.&nbsp; The replication 

    client use THREDDS catalogs in this way but when I last looked it was 

    completely unaware of versions -- i.e. it won't help with #3.&nbsp; 

    <O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">I 

    don't see how Gavin's point addresses any of the requirements above.&nbsp; 

    Even if we ditch #4, which I expect Gavin would argue for, it doesn't 

    directly solve the problem for #1-#3 either.<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">Briefly 

    on some other points that have been made...<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">Balaji, 

    some archive tools maybe can detect 2 paths pointing to the same filesystem 

    inode but both Estani and I have enquired with our backup people and they 

    say hard links must be avoided.&nbsp; I am happy to include a hard-linking 

    option in drslib though.&nbsp; I've created a bugzilla ticket for 

    it.<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">Karl, 

    I think putting real files in "latest" is equivalent to putting real files 

    in the latest "vYYYYMMDD" directory.&nbsp; The directories can be renamed 

    trivially on upgrade but you still have the same problems as the wiki page 

    says.<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">I'm 

    sure there were other points but I've lost track.&nbsp; Checksums will have 

    to wait for another email.<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">Cheers,<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt">Stephen.<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <DIV>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: Consolas; COLOR: rgb(31,73,125); FONT-SIZE: 10.5pt">---<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: Consolas; COLOR: rgb(31,73,125); FONT-SIZE: 10.5pt">Stephen 

    Pascoe&nbsp; +44 (0)1235 445980<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: Consolas; COLOR: rgb(31,73,125); FONT-SIZE: 10.5pt">Centre 

    of Environmental Data Archival<O:P></O:P></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: Consolas; COLOR: rgb(31,73,125); FONT-SIZE: 10.5pt">STFC 

    Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, 

    UK<O:P></O:P></SPAN></P></DIV>

    <P class=MsoNormal><SPAN 

    style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: rgb(31,73,125); FONT-SIZE: 11pt"><O:P></O:P></SPAN></P>

    <DIV>

    <DIV 

    style="BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0cm; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BORDER-TOP: rgb(181,196,223) 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">

    <P class=MsoNormal><B><SPAN 

    style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: windowtext; FONT-SIZE: 10pt" 

    lang=EN-US>From:</SPAN></B><SPAN 

    style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: windowtext; FONT-SIZE: 10pt" 

    lang=EN-US> <A class=moz-txt-link-abbreviated 

    href="mailto:go-essp-tech-bounces@ucar.edu">go-essp-tech-bounces@ucar.edu</A> 

    [<A class=moz-txt-link-freetext 

    href="mailto:go-essp-tech-bounces@ucar.edu">mailto:go-essp-tech-bounces@ucar.edu</A>] 

    <B>On Behalf Of </B>Gavin M. Bell<BR><B>Sent:</B> 20 September 2011 

    17:26<BR><B>To:</B> Kettleborough, Jamie<BR><B>Cc:</B> <A 

    class=moz-txt-link-abbreviated 

    href="mailto:go-essp-tech@ucar.edu">go-essp-tech@ucar.edu</A>; <A 

    class=moz-txt-link-abbreviated 

    href="mailto:esg-node-dev@lists.llnl.gov">esg-node-dev@lists.llnl.gov</A><BR><B>Subject:</B> 

    Re: [Go-essp-tech] Reasoning for the use of symbolic links in 

    drslib<O:P></O:P></SPAN></P></DIV></DIV>

    <P class=MsoNormal><O:P></O:P></P>

    <P class=MsoNormal>Jamie and friends.<BR><BR>You've answered your own 

    questions :-)... <BR>It is the catalog where these checksums (and other 

    features) are recorded.<BR>And thus using the catalog you can see what has 

    changed.<BR>There is a new catalog for every version of a dataset. Given 

    that...<BR>you can quickly and easily inspect catalog_v1 and catalog_v2 to 

    find what the changes are.<BR>This all answers the question of "WHAT" (to 

    download)... the other question of "HOW" is a different, but related 

    story.<BR>The trick is to not conflate the two issues which is what 

    filesystem discussions do.&nbsp; When talking about filesystems you are 

    stipulating the what but implicitly conflating the HOW because you are 

    implicitly designing for tools that intrinsically use the filesystem.&nbsp; 

    It is a muddying of the waters that doesn't separate the two concerns.&nbsp; 

    We need to deal with these two concepts independently in a way that does 

    not&nbsp; limit the system or cause undo burden on institutions by requiring 

    a rigid structure.<BR><BR>As I mentioned... it's not the filesystem we need 

    to look at... it's the catalogs.<BR><BR>just my $0.02 - I'll stop flogging 

    this particular horse... but I hope I have done a better job expressing the 

    issues and where the solution lies (IMHO).<BR><BR>On 9/20/11 8:14 AM, 

    Kettleborough, Jamie wrote: <O:P></O:P></P><PRE>Hello Balaji,<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>I agree - getting all nodes to make the checksums available would be a<O:P></O:P></PRE><PRE>good thing.&nbsp; It gives you both the data integrity check on download, and<O:P></O:P></PRE><PRE>the ability to see what files really have changed from one publication<O:P></O:P></PRE><PRE>version to the next.<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>I don't know how hard it is to do this, particularly for data that is<O:P></O:P></PRE><PRE>already published.<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>Jamie <O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE>

    <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>-----Original Message-----<O:P></O:P></PRE><PRE>From: V. Balaji [<A href="mailto:V.Balaji@noaa.gov" moz-do-not-send="true">mailto:V.Balaji@noaa.gov</A>] <O:P></O:P></PRE><PRE>Sent: 20 September 2011 16:01<O:P></O:P></PRE><PRE>To: Kettleborough, Jamie<O:P></O:P></PRE><PRE>Cc: Karl Taylor; <A href="mailto:go-essp-tech@ucar.edu" moz-do-not-send="true">go-essp-tech@ucar.edu</A>; <A href="mailto:esg-node-dev@lists.llnl.gov" moz-do-not-send="true">esg-node-dev@lists.llnl.gov</A><O:P></O:P></PRE><PRE>Subject: Re: [Go-essp-tech] Reasoning for the use of symbolic <O:P></O:P></PRE><PRE>links in drslib<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>If nodes can currently choose to record checksums or not, I'd <O:P></O:P></PRE><PRE>strongly recommend this be a non-optional requirement.. how <O:P></O:P></PRE><PRE>could anyone download any data with confidence without being <O:P></O:P></PRE><PRE>able to checksum?<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>You can of course check timestamps and filesizes and so on, <O:P></O:P></PRE><PRE>but you have to consider those optimizations... a fast option <O:P></O:P></PRE><PRE>for the less paranoid to avoid the sum computation, which has <O:P></O:P></PRE><PRE>to be the gold standard.<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>"Trust but checksum".<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>Kettleborough, Jamie writes:<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE>

      <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>Hello Karl, everyone,<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>&nbsp;&nbsp; For replicating the latest version, I agree that your alternate <O:P></O:P></PRE><PRE>structure poses difficulties (but it seems like there must <O:P></O:P></PRE></BLOCKQUOTE><PRE>be a way to <O:P></O:P></PRE>

      <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>smartly determine whether the file you already have a file <O:P></O:P></PRE></BLOCKQUOTE><PRE>and simply <O:P></O:P></PRE>

      <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>need to move it, rather than bring it over again).<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>Doesn't every user (not just the replication system) have <O:P></O:P></PRE></BLOCKQUOTE><PRE>this problem:<O:P></O:P></PRE>

      <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>they want to know what files have changed (or not changed) at a new <O:P></O:P></PRE><PRE>publication version.&nbsp; No one wants to be using band width <O:P></O:P></PRE></BLOCKQUOTE><PRE>or storage <O:P></O:P></PRE>

      <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>space to fetch and store files they already have.&nbsp; How is a user <O:P></O:P></PRE><PRE>expected to know what has really changed?&nbsp; Estani mentions <O:P></O:P></PRE></BLOCKQUOTE><PRE>check sums <O:P></O:P></PRE>

      <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>- OK, but I don't think all nodes expose them (is this <O:P></O:P></PRE></BLOCKQUOTE><PRE>right?).&nbsp; You <O:P></O:P></PRE>

      <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>may try to infer from modification dates (not sure, I <O:P></O:P></PRE></BLOCKQUOTE><PRE>haven't look at <O:P></O:P></PRE>

      <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>them that closely).&nbsp; You may try to infer from the <O:P></O:P></PRE></BLOCKQUOTE><PRE>TRACKING_ID - but <O:P></O:P></PRE>

      <BLOCKQUOTE style="MARGIN-TOP: 5pt; MARGIN-BOTTOM: 5pt"><PRE>I'm not sure how reliable this is (I can imagine scenarios where <O:P></O:P></PRE><PRE>different files share the same TRACKING_ID - e.g. if they have been <O:P></O:P></PRE><PRE>modified with an nco tool).<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>Is there a recommended method for users to understand what *files* <O:P></O:P></PRE><PRE>have actually changed when a new publication version appears?<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>Thanks,<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>Jamie<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE></BLOCKQUOTE><PRE><O:P>&nbsp;</O:P></PRE><PRE>-- <O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE>V. Balaji&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Office:&nbsp; +1-609-452-6516<O:P></O:P></PRE><PRE>Head, Modeling Systems Group, GFDL&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Home:&nbsp;&nbsp;&nbsp; +1-212-253-6662<O:P></O:P></PRE><PRE>Princeton University&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Email: <A href="mailto:v.balaji@noaa.gov" moz-do-not-send="true">v.balaji@noaa.gov</A><O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE></BLOCKQUOTE><PRE>_______________________________________________<O:P></O:P></PRE><PRE>GO-ESSP-TECH mailing list<O:P></O:P></PRE><PRE><A href="mailto:GO-ESSP-TECH@ucar.edu" moz-do-not-send="true">GO-ESSP-TECH@ucar.edu</A><O:P></O:P></PRE><PRE><A href="http://mailman.ucar.edu/mailman/listinfo/go-essp-tech" moz-do-not-send="true">http://mailman.ucar.edu/mailman/listinfo/go-essp-tech</A><O:P></O:P></PRE>

    <P class=MsoNormal><BR><BR><O:P></O:P></P><PRE>-- <O:P></O:P></PRE><PRE>Gavin M. Bell<O:P></O:P></PRE><PRE>--<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE><PRE> "Never mistake a clear view for a short distance."<O:P></O:P></PRE><PRE>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;-Paul Saffo<O:P></O:P></PRE><PRE><O:P>&nbsp;</O:P></PRE></DIV><BR>

    <P>-- <BR>Scanned by iCritical. </P><BR></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>