<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.19154"></HEAD>
<BODY bgColor=#ffffff text=#000000>
<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff
size=2 face=Arial>Hello Estani,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff
size=2 face=Arial>thanks for this information - very useful. One question
then some follow up inline.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff
size=2 face=Arial>How does the wget generator (or search) deal with replicas -
what determines which replica the user will download or get returned from the
search? </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff
size=2 face=Arial>Jamie</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px"
dir=ltr>
<DIV></DIV>
<DIV><STRONG><FONT size=2 face=Tahoma></FONT></STRONG>Well, the P2P has the
/search which returns an xml response that the user could parse and the wget
script has the -w flag (also the current one) that outputs this list.<BR>The
way the wget script is designed, it's pretty simple to extract this list
anyhow as the whole wget script is not more than a "bash" decoration (with all
the intelligence). That's e.g. what the DML uses for ingesting the files to be
retrieved.<BR><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial> </FONT></SPAN></DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 face=Arial>this
is good to know.</FONT></SPAN></DIV>
<DIV><SPAN class=901251417-14122011> </SPAN><BR>The replication system is
much more complicated, because you handle many more files at the same time, so
a simple script won't be able to manage 200.000 urls for replicating a couple
of experiments (3~4 urls per file in our case). Furthermore there are many
other requirements that the user don't have, including publication. But at the
very bottom there are many similarities indeed.<SPAN
class=901251417-14122011><FONT color=#0000ff size=2
face=Arial> </FONT></SPAN></DIV>
<DIV><SPAN class=901251417-14122011></SPAN> </DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial>Happy to be corrected on my niave view of the replication problem -
though I still think it is useful to recognise what is common. Picking
up a couple of comments from your previous e-mail. How modular is the
replication system, and how much work would be involved in using
those modules that deal with that common stuff 'at the very bottom' to
write an 'intellegent' user client?</FONT> </SPAN><BR><BR>Regarding a
client wget script generator, it's the other way around how it works now. You
get the wget script and from it the list of files. It already checks for
downloaded files, so you don't need to do that and create a new wget script,
it will do it for you.<SPAN class=901251417-14122011><FONT color=#0000ff
size=2 face=Arial> </FONT></SPAN></DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial>Do</FONT></SPAN><SPAN class=901251417-14122011><FONT color=#0000ff
size=2 face=Arial>esn't this mean the wget script generator has to know the
directory structure the user is using for their local replica of the archive -
and this may differ (hey as we know its not even the same from node to
node). Your bash is way ahead of mine - so I could be wrong in what
follows, but from what *I* could tell from the sample wget script (generated
from an example on <A
href="http://www.esgf.org/wiki/ESGF_scripting">http://www.esgf.org/wiki/ESGF_scripting</A>)
it simply uses the file name assuming the file is in the local
directory.</FONT></SPAN></DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 face=Arial>Have
you considered copying to the DRS directory structure as the default -
this has a nice side effect of helping users know what version they have
downloaded. (Though you'd need to get version, and other drs elements missing
from the filename, into your script I think). I know my
suggestion would force people to use the DRS locally... but I don't know that
thats a *bad* thing. It would also have to either be run from the root
of their local copy, or have a compulsary argument that is the root
of their local copy.</FONT></SPAN></DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 face=Arial>The
wget script *could* then also get really clever and not do the remote copy of
data that has not changed from one version to the next (based on checksum),
but just put in a hard link or something like that.... thats probably quite a
way to getting an 'intellegent' user client?</FONT></SPAN></DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial>Having just written this I realised it doesn't really work for
us at the MO as our 'download server' - where we run wget - can not see
</FONT> <FONT color=#0000ff size=2 face=Arial>the disk where we keep
our local replicas of the cmip5 and tamip archives because of
security constraints. Thankfully the machines that do the list/search
before we fetch can see the local replica so we can filter the list
returned. It *may* work for others though?</FONT></SPAN><SPAN
class=901251417-14122011><FONT color=#0000ff size=2
face=Arial> </FONT></SPAN><SPAN
class=901251417-14122011> </SPAN><BR><BR> If a download manager or
any other "better" application communicates with the node, it should use the
xml semantic. If a simple "one-way" script is required, the wget is a good way
of getting it, dump the list (the overhead is really minimal) and process as
usual...<BR><BR>I think these two options are just fine, but correct me if you
don't share my view.<SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial> </FONT></SPAN></DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 face=Arial>I
think your options work - though I can't see how to get the checksum
from either the xml or the wget -w option (and I'm not sure the wget
-w ever would?). *BUT* that may be just because the sample datasets I'm
using don't make the checksums available?</FONT></SPAN></DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2
face=Arial>Thanks again for your
reply.</FONT></SPAN><BR><BR>Thanks,<BR>Estani<BR> <BR>Am 14.12.2011
17:43, schrieb Kettleborough, Jamie: </DIV>
<BLOCKQUOTE
cite=mid:E51EDFEBF10BE44BB4BDAF5FC2F024B90FB7F972@EXXMAIL02.desktop.frd.metoffice.com
type="cite">
<META name=GENERATOR content="MSHTML 8.00.6001.19154">
<BLOCKQUOTE
style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px"
dir=ltr>
<DIV>So, the way I picture this is:<BR>1) get the list of files to be
downloaded (in the wget script or by any other means)<BR>2) filter that to
remove what is not required<BR><SPAN
class=315445615-14122011> </SPAN></DIV></BLOCKQUOTE><SPAN
class=315445615-14122011>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN
class=315445615-14122011>This is basically what we do MO - we create
list of files to download, then compare it with our local file system, and
we filter out any we already have. I think the replication system
would have to do this too wouldn't it? For what its worth I
think *every* user has their own version of the replication problem - just
the set of files they are trying to replicate is different and they might be
using a different protocol to fetch the data.</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN
class=315445615-14122011></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN
class=315445615-14122011>If you accept this way of working as
valid/acceptable/encouraged then does it have implications for the
(scriptable) interfaces to either P2P and or gateway 2? I think
it means there 'should' be an interface that returns a list of files (not
wrapped in a wget script) and then maybe a service (either client side or
server side) that will take a list of urls and generate the wget
scripts. If you only have an interface that returns wget scripts then
users will have to parse these to enable them to filter out the files they
already have copies of.</SPAN></FONT></DIV>
<DIV> </DIV>
<DIV><SPAN class=315445615-14122011><FONT color=#0000ff size=2
face=Arial>Jamie</FONT></SPAN></DIV>
<DIV><SPAN class=315445615-14122011></SPAN> </DIV>
<DIV><SPAN class=315445615-14122011><FONT color=#0000ff size=2
face=Arial>(Sebastien - I'm aware this sort of touches on a set
of unanswered questions you asked a while ago related to what we
do at the MO... I've not forgotten I want to answer this is
more detail, apologies for being so rubbish at answering so
far). </FONT></SPAN></DIV></SPAN></BLOCKQUOTE><BR><BR><PRE class=moz-signature cols="72">--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: <A class=moz-txt-link-abbreviated href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</A> </PRE></BLOCKQUOTE></BODY></HTML>