<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";
        color:black;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        color:black;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="#FFFFCC" lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi All,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Lots of good discussion here and sorry I've been keeping quiet. I want to remind ourselves of the requirements I laid out in the wiki page<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">1. It should allow data from multiple versions to be kept on disk simultaneously.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">2. It should avoid storing multiple copies of files that are present in more than one version.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">3. It should be straightforward to copy dataset changes (i.e. differences between versions) between nodes to allow efficient replication.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">4. It should rely only on the filesystem so that generic tools like FTP could be used to expose the structure if necessary.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">In my view we should address these directly. Are they needed? Which are the most important?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Gavin said about catalogs<o:p></o:p></span></p>
<p class="MsoNormal">> you can quickly and easily inspect catalog_v1 and catalog_v2 to find what the changes are.<br>
> This all answers the question of "WHAT" (to download)... the other question of "HOW" is a different, but related story.<br>
> The trick is to not conflate the two issues which is what filesystem discussions do. . <span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">But THREDDS conflates the two as well! A THREDDS catalog contains descriptions of service endpoints that are not independent of the node serving the data (the
"HOW"). Maybe we should have developed a true catalog format but that is not where we are now. The replication client use THREDDS catalogs in this way but when I last looked it was completely unaware of versions -- i.e. it won't help with #3.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I don't see how Gavin's point addresses any of the requirements above. Even if we ditch #4, which I expect Gavin would argue for, it doesn't directly solve
the problem for #1-#3 either.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Briefly on some other points that have been made...<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Balaji, some archive tools maybe can detect 2 paths pointing to the same filesystem inode but both Estani and I have enquired with our backup people and they
say hard links must be avoided. I am happy to include a hard-linking option in drslib though. I've created a bugzilla ticket for it.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Karl, I think putting real files in "latest" is equivalent to putting real files in the latest "vYYYYMMDD" directory. The directories can be renamed trivially
on upgrade but you still have the same problems as the wiki page says.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I'm sure there were other points but I've lost track. Checksums will have to wait for another email.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Stephen.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas;color:#1F497D">---<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas;color:#1F497D">Stephen Pascoe +44 (0)1235 445980<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas;color:#1F497D">Centre of Environmental Data Archival<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas;color:#1F497D">STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"> go-essp-tech-bounces@ucar.edu
[mailto:go-essp-tech-bounces@ucar.edu] <b>On Behalf Of </b>Gavin M. Bell<br>
<b>Sent:</b> 20 September 2011 17:26<br>
<b>To:</b> Kettleborough, Jamie<br>
<b>Cc:</b> go-essp-tech@ucar.edu; esg-node-dev@lists.llnl.gov<br>
<b>Subject:</b> Re: [Go-essp-tech] Reasoning for the use of symbolic links in drslib<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Jamie and friends.<br>
<br>
You've answered your own questions :-)... <br>
It is the catalog where these checksums (and other features) are recorded.<br>
And thus using the catalog you can see what has changed.<br>
There is a new catalog for every version of a dataset. Given that...<br>
you can quickly and easily inspect catalog_v1 and catalog_v2 to find what the changes are.<br>
This all answers the question of "WHAT" (to download)... the other question of "HOW" is a different, but related story.<br>
The trick is to not conflate the two issues which is what filesystem discussions do. When talking about filesystems you are stipulating the what but implicitly conflating the HOW because you are implicitly designing for tools that intrinsically use the filesystem.
It is a muddying of the waters that doesn't separate the two concerns. We need to deal with these two concepts independently in a way that does not limit the system or cause undo burden on institutions by requiring a rigid structure.<br>
<br>
As I mentioned... it's not the filesystem we need to look at... it's the catalogs.<br>
<br>
just my $0.02 - I'll stop flogging this particular horse... but I hope I have done a better job expressing the issues and where the solution lies (IMHO).<br>
<br>
On 9/20/11 8:14 AM, Kettleborough, Jamie wrote: <o:p></o:p></p>
<pre>Hello Balaji,<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>I agree - getting all nodes to make the checksums available would be a<o:p></o:p></pre>
<pre>good thing. It gives you both the data integrity check on download, and<o:p></o:p></pre>
<pre>the ability to see what files really have changed from one publication<o:p></o:p></pre>
<pre>version to the next.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>I don't know how hard it is to do this, particularly for data that is<o:p></o:p></pre>
<pre>already published.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Jamie <o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>-----Original Message-----<o:p></o:p></pre>
<pre>From: V. Balaji [<a href="mailto:V.Balaji@noaa.gov">mailto:V.Balaji@noaa.gov</a>] <o:p></o:p></pre>
<pre>Sent: 20 September 2011 16:01<o:p></o:p></pre>
<pre>To: Kettleborough, Jamie<o:p></o:p></pre>
<pre>Cc: Karl Taylor; <a href="mailto:go-essp-tech@ucar.edu">go-essp-tech@ucar.edu</a>; <a href="mailto:esg-node-dev@lists.llnl.gov">esg-node-dev@lists.llnl.gov</a><o:p></o:p></pre>
<pre>Subject: Re: [Go-essp-tech] Reasoning for the use of symbolic <o:p></o:p></pre>
<pre>links in drslib<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>If nodes can currently choose to record checksums or not, I'd <o:p></o:p></pre>
<pre>strongly recommend this be a non-optional requirement.. how <o:p></o:p></pre>
<pre>could anyone download any data with confidence without being <o:p></o:p></pre>
<pre>able to checksum?<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>You can of course check timestamps and filesizes and so on, <o:p></o:p></pre>
<pre>but you have to consider those optimizations... a fast option <o:p></o:p></pre>
<pre>for the less paranoid to avoid the sum computation, which has <o:p></o:p></pre>
<pre>to be the gold standard.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>"Trust but checksum".<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Kettleborough, Jamie writes:<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>Hello Karl, everyone,<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<pre> For replicating the latest version, I agree that your alternate <o:p></o:p></pre>
<pre>structure poses difficulties (but it seems like there must <o:p></o:p></pre>
</blockquote>
<pre>be a way to <o:p></o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>smartly determine whether the file you already have a file <o:p></o:p></pre>
</blockquote>
<pre>and simply <o:p></o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>need to move it, rather than bring it over again).<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Doesn't every user (not just the replication system) have <o:p></o:p></pre>
</blockquote>
<pre>this problem:<o:p></o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>they want to know what files have changed (or not changed) at a new <o:p></o:p></pre>
<pre>publication version. No one wants to be using band width <o:p></o:p></pre>
</blockquote>
<pre>or storage <o:p></o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>space to fetch and store files they already have. How is a user <o:p></o:p></pre>
<pre>expected to know what has really changed? Estani mentions <o:p></o:p></pre>
</blockquote>
<pre>check sums <o:p></o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>- OK, but I don't think all nodes expose them (is this <o:p></o:p></pre>
</blockquote>
<pre>right?). You <o:p></o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>may try to infer from modification dates (not sure, I <o:p></o:p></pre>
</blockquote>
<pre>haven't look at <o:p></o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>them that closely). You may try to infer from the <o:p></o:p></pre>
</blockquote>
<pre>TRACKING_ID - but <o:p></o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>I'm not sure how reliable this is (I can imagine scenarios where <o:p></o:p></pre>
<pre>different files share the same TRACKING_ID - e.g. if they have been <o:p></o:p></pre>
<pre>modified with an nco tool).<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Is there a recommended method for users to understand what *files* <o:p></o:p></pre>
<pre>have actually changed when a new publication version appears?<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Thanks,<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Jamie<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>-- <o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>V. Balaji Office: +1-609-452-6516<o:p></o:p></pre>
<pre>Head, Modeling Systems Group, GFDL Home: +1-212-253-6662<o:p></o:p></pre>
<pre>Princeton University Email: <a href="mailto:v.balaji@noaa.gov">v.balaji@noaa.gov</a><o:p></o:p></pre>
<pre><o:p> </o:p></pre>
</blockquote>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>GO-ESSP-TECH mailing list<o:p></o:p></pre>
<pre><a href="mailto:GO-ESSP-TECH@ucar.edu">GO-ESSP-TECH@ucar.edu</a><o:p></o:p></pre>
<pre><a href="http://mailman.ucar.edu/mailman/listinfo/go-essp-tech">http://mailman.ucar.edu/mailman/listinfo/go-essp-tech</a><o:p></o:p></pre>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<pre>-- <o:p></o:p></pre>
<pre>Gavin M. Bell<o:p></o:p></pre>
<pre>--<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre> "Never mistake a clear view for a short distance."<o:p></o:p></pre>
<pre> -Paul Saffo<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
</div>
<br><p>--
<BR>Scanned by iCritical.
</p>
<br></body>
</html>