<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffcc">
Hi Estani and colleagues, :-)<br>
<br>
Okay, so let me jump in for a minute. There are two notions that
are being conflated in this discussion. Everyone is used to using
paths and such to find things on the filesystem. Also people are
used to using tried and true mechanisms that use the filesystem to
get to information remotely by further qualifying the filesystem
path with the host. This is all well and good for the scope of
these tools.<br>
<br>
Now we are in a distributed world as we build this ESG*F*
(Federation) that will unify and sew together disparate
organizations' data into a seamless 'dataspace'. The goal of
building such a thing is to make it easy for all interested in the
data to get to data and post data and in so doing share data in an
environment that is fluid.<br>
<br>
ESGF is providing a mechanism/platform/infrastructure... that
simultaneously addresses the need for everyone to share data while
maintaining sovereign custody over their data assets. ESGF has
already met this challenge in many ways. However, to continue to
make the system simple and easy to use and a joy to use we should
alleviate the requirement of filesystem structure. This is a
particular case where 'some' is good but 'too much' hurts.<br>
<br>
So now, cutting to the chase. More than anecdotal evidence (the
length of this discussion) clearly suggests that strict filesystem
adherence is not in accord with the sovereignty we would like
organizations to enjoy. It would behoove us to operate the
federation such that descriptors in the context of the federation
are divorced from filesystem structure itself. This can be achieved
rather directly.<br>
<br>
Going back to what I initially said, the two notions being conflated
here are the *query* and the *resource*. An URL, even the
filesystem path itself, is nothing more than a query to the
network/operating system to locate bits on a platter (clearly I am
dating myself). We should use the DRS as the Federation's canonical
locator for resources. The DRS is the *query* (in the same spirit
as above). The ESGF system, just like the filesystem, would resolve
the query (DRS) to the resource. This, by the way, bears fruit in
quite few places in the system making quite few things more
efficient.<br>
<br>
I have thought about this particular filesystem problem and have
come up with a solution... the solution would allow us to still use
tools like wget/curl right out of the box and with a little bit of
tweaking gridftp and globus. As a matter of fact the solution would
lend itself to being used by any tool old or new. To more directly
address Estani's questions about *relying* on things.... I don't
think that the tone of that should be so pejorative. You *use* a
tool because it helps you. I feel that using the ESGF infrastructure
is useful to the community and the communities goals. I don't think
that it is too much skin in the game to ask for. If things go
horribly wrong, your organization has it's own filesystem structure
that fits their needs that they can rely on in order to make sense
of things as they see it. So, fundamentally the act of scanning the
data is what provides the cohesion between the DRS and filesystem
structure. The job of scanning is certainly not terribly
laborious. So there is quite literally very *little* cost to
"relying" on a system/infrastructure/set of tools that is ESGF,
especially compared to the benefit of what ESGF can bring to this
community. I find it hard to conjure a cogent argument against
creating a flexible system, especially given the nature of this
multi-organization, international effort. We must make it easy for
organizations to be independent and not push a myopic view (IMHO) of
a certain state of the world on everyone.<br>
<br>
Thank you for reading this rather lengthy email... I need an
in-house editor perhaps... I tend to get garrulous but I wanted to
be as clear as I could.<br>
<br>
If there isn't already a working group on this I would like to
propose one, we can set it up on the ESGF wiki and talk more about
this. :-)<br>
<br>
P.S.<br>
In 10 years ESGF will have morphed into something even more
lovely... because it is build by the all of us and nurtured on our
wisdom :-). The will be the tool people count on and rely on as you
alluded to with ftp, et. al. There is no tomorrow without today
(modulo the quantum mechanics fridge).<br>
<br>
On 9/2/11 2:55 AM, Estanislao Gonzalez wrote:
<blockquote cite="mid:4E60A81E.3020205@dkrz.de" type="cite">I know
the main idea is to create a middleware layer that would make file
structures obsolete. But then, we will have to write all tools
again in order to interact with this intermediate level or at
least patch them somehow. gridFTP, as well as ftp, are only useful
as transmission protocols, you can't write your own script to use
them, you have to rely on either the gateway or the datanode to
find what you are looking.<br>
In my opinion, we will be relying too much in the ESG
infrastructure. What would happen if we loose the publisher
database? How would we tell apart one version from another, if
this is not represented in the directory structure?<br>
My fear is that if we keep separating the metadata from the data
itself, we add a new weak link in the chain. Now if we loose the
metadata the data will also be useless (this would be indeed the
worst case scenario). In 10 years we will have no idea what this
interfaces were like, probably both data node and gateways will be
superseded by newer versions that can't translate our old
requirements. But as I said, that's a problem for LTAs only. In
any case, we need the middleware to provide some services and
speed things up, but I don't think we should rely blindly on it.<br>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Gavin M. Bell
--
"Never mistake a clear view for a short distance."
         -Paul Saffo
</pre>
</body>
</html>