<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffcc" text="#000000">
Hi...<br>
<br>
To keep this short... <br>
1) The software is not in place yet, but it can be put in place,
much of it has already been written.<br>
2) This solution can be made to easily work with systems beyond
those that use the HTTP protocol.<br>
3) The end-user is always 'understood' in all of the
contributions/comments/discussions about the system, it goes for me
and I assumed it went for everyone else and all that is said. We
are here to build a system for our community.<br>
4) It is exactly because the DRS may change and evolve why such an
resolution system is advantageous. It is also advantageous in a
host of end-user related activities that makes the system more
robust and easy to use.<br>
5) Th does not preclude any organization from not using it entirely
as the see fit. A testament to the inherent flexibility of the
solution.<br>
6) This ESGF system takes the best parts of the torrent architecture
and some others to achieve our needs in an effective way that
provides robustness and longevity to our the community - our
end-users. The ESGF architecture avoids certain known issues with
torrents. <br>
<br>
I had a much lengthier response... but decided to keep it brief.
Those wanting to discuss and debate this further, I suggest we put
together a group where we can delve into the details.<br>
The idea is clean and cogent and will help the system as a whole.<br>
<br>
"A strong tide lifts all boats"<br>
<br>
On 9/5/11 4:23 AM, <a class="moz-txt-link-abbreviated" href="mailto:stephen.pascoe@stfc.ac.uk">stephen.pascoe@stfc.ac.uk</a> wrote:
<blockquote
cite="mid:4C353E6E4A08AE4792B350DAA392B5210EC4DC72@EXCHMBX01.fed.cclrc.ac.uk"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 12 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";
        color:black;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        color:black;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">Dear all,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">I feel we've been through this argument many
times before. I have a few thoughts to add below ...<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">* I appreciate Gavin's argument assuming all
indirection software is in place. The problem is it isn't.
We already have people using the system and expecting some
sort of uniformity of interface. The HelpDesk has been
asked several times about how to write generic web scripts,
crawlers, or anything that would help them keep up to date
on the data without continually browsing the Gateways. A
consistent URL structure is one think we said we'd give
users. This can be done with the current stack if you put
your data in DRS structure, although much more could be done
to make data access easier to script.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">* The QC tool was developed assuming the DRS
directory structure on disk -- no HTTP middleware is going
to help in this case. I believe this was a communication
failure, pure and simple. Developers at DKRZ believed they
could rely on the DRS directory structure and they can't.
We need to do better communicating our policy.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">* I generally agree with Gavin technically but I
notice he doesn't mention end-user once. It is clear that
his target users are ESGF developers and deployers, not
scientists wanting data services. That's fine: Gavin's role
is thinking ahead about the architecture but, at this stage,
CMIP5 user requirements should be the priority of ESGF
development as a whole. Without that, some of us are going
to spend most of the next year fighting fires and helping
frustrated users with an unusable system.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">* I'm sure the DRS in it's current form won't fit
everyone's future metadata needs and so I'm not arguing the
datanode should be forced to publish DRS structured data,
just that we publish CMIP5 data in this structure. One day
there is a much wider debate to be had about whether the
details of DRS are right.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">* I sincerely hope we can explore torrents and
torrent-like technology in ExArch.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">Stephen.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size: 10.5pt;
font-family: Consolas; color: rgb(31, 73, 125);">---<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 10.5pt;
font-family: Consolas; color: rgb(31, 73, 125);">Stephen
Pascoe +44 (0)1235 445980<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 10.5pt;
font-family: Consolas; color: rgb(31, 73, 125);">Centre of
Environmental Data Archival<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 10.5pt;
font-family: Consolas; color: rgb(31, 73, 125);">STFC
Rutherford Appleton Laboratory, Harwell Oxford, Didcot
OX11 0QX, UK<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<div>
<div style="border-right: medium none; border-width: 1pt
medium medium; border-style: solid none none; border-color:
rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color;
padding: 3pt 0cm 0cm;">
<p class="MsoNormal"><b><span style="font-size: 10pt;
font-family:
"Tahoma","sans-serif"; color:
windowtext;" lang="EN-US">From:</span></b><span
style="font-size: 10pt; font-family:
"Tahoma","sans-serif"; color:
windowtext;" lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:go-essp-tech-bounces@ucar.edu">go-essp-tech-bounces@ucar.edu</a>
[<a class="moz-txt-link-freetext" href="mailto:go-essp-tech-bounces@ucar.edu">mailto:go-essp-tech-bounces@ucar.edu</a>] <b>On Behalf Of
</b>Gavin M. Bell<br>
<b>Sent:</b> 02 September 2011 20:25<br>
<b>To:</b> V. Balaji<br>
<b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:go-essp-tech@ucar.edu">go-essp-tech@ucar.edu</a>; Luca Cinquini;
<a class="moz-txt-link-abbreviated" href="mailto:esg-node-dev@lists.llnl.gov">esg-node-dev@lists.llnl.gov</a>; Laura Carriere<br>
<b>Subject:</b> Re: [Go-essp-tech] Non-DRS File
structure at data nodes<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi Balaji, <br>
<br>
Indeed you have good points. The only thing I am suggesting
is that it is an equally daunting task to *impose* anything on
a group of people. I am a big fan of the benevolent
dictatorship, but as history tells us, they don't last. I
guess I would contend that having a transparent, open, easy to
grok algorithm would be incumbent upon any institution that
decides to take advantage of such an indirection mechanism.
It is optional. It may very well be for a community as
disciplined as the climate community (I am quite serious many
other communities look to the climate community as model of
organization) that having a single structure would suffice.
But from a system admin perspective filesystem requirements
may be prohibitive. As we can see just from this discussion,
many are already looking for ways to perform this
indirection. I propose that we allow this indirection in a
regimented way so we can all understand / embrace the
mechanism by which this is done. Think of it like a well
known hashing algorithm that we know how to plug into to get
out what we want. As long as the transformation machinery is
clear then the particular transform becomes a simpler, more
circumscribed task. I only propose that we allow for this at
the highest ingress level... ESGF. This ameliorates the
burden on ESGF. Let's not forget ESGF belongs to all of us,
*we* are *them*.<br>
<br>
As for torrents... well, in my mind's eye that is what we are
building to some extent, but with better security that avoids
some of the byzantine attacks torrents are prone to. ESGF
should have the best features from that community... at least
that's part of the goal of the design. Oh, and ESGF won't
preclude using torrents... as a matter of fact you could
create a back-end that is a true torrent in/egress that plugs
into ESGF... Oh... but then you will need a transformation
layer to allow that to happen, it would be nice to have one
available to plug into, right? ;-).<br>
<br>
On 9/2/11 11:53 AM, V. Balaji wrote: <o:p></o:p></p>
<pre>I've been a proponent since the beginning of having a file layout<o:p></o:p></pre>
<pre>(DRS) agreed by convention and _imposed_ (rather than recommended)<o:p></o:p></pre>
<pre>on participant nodes. While this may be old-fashioned thinking,<o:p></o:p></pre>
<pre>our finding is that predictable paths are the most useful thing for<o:p></o:p></pre>
<pre>building the software, and I continue to believe that it's not so<o:p></o:p></pre>
<pre>difficult to agree upon a file layout. I think the difficulties here<o:p></o:p></pre>
<pre>arose from a discrepancy in the way DRSlib and CMOR handled versioning<o:p></o:p></pre>
<pre>rather than people digging their heels in about a conventionally<o:p></o:p></pre>
<pre>agreed file and directory layout.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Regarding Gavin's larger point, having an indirection layer in the<o:p></o:p></pre>
<pre>middleware separating the apparent path in the query from the actual<o:p></o:p></pre>
<pre>path in the resource introduces a huge dependency on that indirection<o:p></o:p></pre>
<pre>layer: pretty much nothing can function without it. I'm not sure ESGF<o:p></o:p></pre>
<pre>should take upon itself such a huge burden.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>With DRS being an imposed convention, you could undertake many tasks<o:p></o:p></pre>
<pre>following software paths for which we aren't responsible. There are<o:p></o:p></pre>
<pre>many tasks -- e.g data movement, replication -- which are shared by<o:p></o:p></pre>
<pre>communities much larger than ESGF and shouldn't require specialized<o:p></o:p></pre>
<pre>middleware. One of my big disappointments is that we don't use torrents<o:p></o:p></pre>
<pre>for anything:-).<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Gavin M. Bell writes:<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top: 5pt; margin-bottom: 5pt;">
<pre>Hi Estani and colleagues, :-)<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Okay, so let me jump in for a minute. There are two notions that are<o:p></o:p></pre>
<pre>being conflated in this discussion. Everyone is used to using paths and<o:p></o:p></pre>
<pre>such to find things on the filesystem. Also people are used to using<o:p></o:p></pre>
<pre>tried and true mechanisms that use the filesystem to get to information<o:p></o:p></pre>
<pre>remotely by further qualifying the filesystem path with the host. This<o:p></o:p></pre>
<pre>is all well and good for the scope of these tools.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Now we are in a distributed world as we build this ESG*F* (Federation)<o:p></o:p></pre>
<pre>that will unify and sew together disparate organizations' data into a<o:p></o:p></pre>
<pre>seamless 'dataspace'. The goal of building such a thing is to make it<o:p></o:p></pre>
<pre>easy for all interested in the data to get to data and post data and in<o:p></o:p></pre>
<pre>so doing share data in an environment that is fluid.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>ESGF is providing a mechanism/platform/infrastructure... that<o:p></o:p></pre>
<pre>simultaneously addresses the need for everyone to share data while<o:p></o:p></pre>
<pre>maintaining sovereign custody over their data assets. ESGF has already<o:p></o:p></pre>
<pre>met this challenge in many ways. However, to continue to make the<o:p></o:p></pre>
<pre>system simple and easy to use and a joy to use we should alleviate the<o:p></o:p></pre>
<pre>requirement of filesystem structure. This is a particular case where<o:p></o:p></pre>
<pre>'some' is good but 'too much' hurts.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>So now, cutting to the chase. More than anecdotal evidence (the length<o:p></o:p></pre>
<pre>of this discussion) clearly suggests that strict filesystem adherence is<o:p></o:p></pre>
<pre>not in accord with the sovereignty we would like organizations to<o:p></o:p></pre>
<pre>enjoy. It would behoove us to operate the federation such that<o:p></o:p></pre>
<pre>descriptors in the context of the federation are divorced from<o:p></o:p></pre>
<pre>filesystem structure itself. This can be achieved rather directly.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Going back to what I initially said, the two notions being conflated<o:p></o:p></pre>
<pre>here are the *query* and the *resource*. An URL, even the filesystem<o:p></o:p></pre>
<pre>path itself, is nothing more than a query to the network/operating<o:p></o:p></pre>
<pre>system to locate bits on a platter (clearly I am dating myself). We<o:p></o:p></pre>
<pre>should use the DRS as the Federation's canonical locator for resources.<o:p></o:p></pre>
<pre>The DRS is the *query* (in the same spirit as above). The ESGF system,<o:p></o:p></pre>
<pre>just like the filesystem, would resolve the query (DRS) to the<o:p></o:p></pre>
<pre>resource. This, by the way, bears fruit in quite few places in the<o:p></o:p></pre>
<pre>system making quite few things more efficient.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>I have thought about this particular filesystem problem and have come up<o:p></o:p></pre>
<pre>with a solution... the solution would allow us to still use tools like<o:p></o:p></pre>
<pre>wget/curl right out of the box and with a little bit of tweaking gridftp<o:p></o:p></pre>
<pre>and globus. As a matter of fact the solution would lend itself to being<o:p></o:p></pre>
<pre>used by any tool old or new. To more directly address Estani's<o:p></o:p></pre>
<pre>questions about *relying* on things.... I don't think that the tone of<o:p></o:p></pre>
<pre>that should be so pejorative. You *use* a tool because it helps you. I<o:p></o:p></pre>
<pre>feel that using the ESGF infrastructure is useful to the community and<o:p></o:p></pre>
<pre>the communities goals. I don't think that it is too much skin in the<o:p></o:p></pre>
<pre>game to ask for. If things go horribly wrong, your organization has<o:p></o:p></pre>
<pre>it's own filesystem structure that fits their needs that they can rely<o:p></o:p></pre>
<pre>on in order to make sense of things as they see it. So, fundamentally<o:p></o:p></pre>
<pre>the act of scanning the data is what provides the cohesion between the<o:p></o:p></pre>
<pre>DRS and filesystem structure. The job of scanning is certainly not<o:p></o:p></pre>
<pre>terribly laborious. So there is quite literally very *little* cost to<o:p></o:p></pre>
<pre>"relying" on a system/infrastructure/set of tools that is ESGF,<o:p></o:p></pre>
<pre>especially compared to the benefit of what ESGF can bring to this<o:p></o:p></pre>
<pre>community. I find it hard to conjure a cogent argument against creating<o:p></o:p></pre>
<pre>a flexible system, especially given the nature of this<o:p></o:p></pre>
<pre>multi-organization, international effort. We must make it easy for<o:p></o:p></pre>
<pre>organizations to be independent and not push a myopic view (IMHO) of a<o:p></o:p></pre>
<pre>certain state of the world on everyone.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Thank you for reading this rather lengthy email... I need an in-house<o:p></o:p></pre>
<pre>editor perhaps... I tend to get garrulous but I wanted to be as clear as<o:p></o:p></pre>
<pre>I could.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>If there isn't already a working group on this I would like to propose<o:p></o:p></pre>
<pre>one, we can set it up on the ESGF wiki and talk more about this. :-)<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>P.S.<o:p></o:p></pre>
<pre>In 10 years ESGF will have morphed into something even more lovely...<o:p></o:p></pre>
<pre>because it is build by the all of us and nurtured on our wisdom :-).<o:p></o:p></pre>
<pre>The will be the tool people count on and rely on as you alluded to with<o:p></o:p></pre>
<pre>ftp, et. al. There is no tomorrow without today (modulo the quantum<o:p></o:p></pre>
<pre>mechanics fridge).<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>On 9/2/11 2:55 AM, Estanislao Gonzalez wrote:<o:p></o:p></pre>
<blockquote style="margin-top: 5pt; margin-bottom: 5pt;">
<pre>I know the main idea is to create a middleware layer that would make<o:p></o:p></pre>
<pre>file structures obsolete. But then, we will have to write all tools<o:p></o:p></pre>
<pre>again in order to interact with this intermediate level or at least<o:p></o:p></pre>
<pre>patch them somehow. gridFTP, as well as ftp, are only useful as<o:p></o:p></pre>
<pre>transmission protocols, you can't write your own script to use them,<o:p></o:p></pre>
<pre>you have to rely on either the gateway or the datanode to find what<o:p></o:p></pre>
<pre>you are looking.<o:p></o:p></pre>
<pre>In my opinion, we will be relying too much in the ESG infrastructure.<o:p></o:p></pre>
<pre>What would happen if we loose the publisher database? How would we<o:p></o:p></pre>
<pre>tell apart one version from another, if this is not represented in the<o:p></o:p></pre>
<pre>directory structure?<o:p></o:p></pre>
<pre>My fear is that if we keep separating the metadata from the data<o:p></o:p></pre>
<pre>itself, we add a new weak link in the chain. Now if we loose the<o:p></o:p></pre>
<pre>metadata the data will also be useless (this would be indeed the worst<o:p></o:p></pre>
<pre>case scenario). In 10 years we will have no idea what this interfaces<o:p></o:p></pre>
<pre>were like, probably both data node and gateways will be superseded by<o:p></o:p></pre>
<pre>newer versions that can't translate our old requirements. But as I<o:p></o:p></pre>
<pre>said, that's a problem for LTAs only. In any case, we need the<o:p></o:p></pre>
<pre>middleware to provide some services and speed things up, but I don't<o:p></o:p></pre>
<pre>think we should rely blindly on it.<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<pre>-- <o:p></o:p></pre>
<pre>Gavin M. Bell<o:p></o:p></pre>
<pre>--<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre> "Never mistake a clear view for a short distance."<o:p></o:p></pre>
<pre> -Paul Saffo<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
</div>
<br>
<p>-- <br>
Scanned by iCritical.
</p>
<br>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Gavin M. Bell
--
"Never mistake a clear view for a short distance."
         -Paul Saffo
</pre>
</body>
</html>