<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Well, the P2P has the /search which returns an xml response that the
user could parse and the wget script has the -w flag (also the
current one) that outputs this list.<br>
The way the wget script is designed, it's pretty simple to extract
this list anyhow as the whole wget script is not more than a "bash"
decoration (with all the intelligence). That's e.g. what the DML
uses for ingesting the files to be retrieved.<br>
<br>
The replication system is much more complicated, because you handle
many more files at the same time, so a simple script won't be able
to manage 200.000 urls for replicating a couple of experiments (3~4
urls per file in our case). Furthermore there are many other
requirements that the user don't have, including publication. But at
the very bottom there are many similarities indeed.<br>
<br>
Basically, the wget script is not meant for moving terabytes of
data, and not because of the protocol, but because the management of
this is much to complicated.<br>
<br>
Regarding a client wget script generator, it's the other way around
how it works now. You get the wget script and from it the list of
files. It already checks for downloaded files, so you don't need to
do that and create a new wget script, it will do it for you.<br>
<br>
If a download manager or any other "better" application
communicates with the node, it should use the xml semantic. If a
simple "one-way" script is required, the wget is a good way of
getting it, dump the list (the overhead is really minimal) and
process as usual...<br>
<br>
I think these two options are just fine, but correct me if you don't
share my view.<br>
<br>
Thanks,<br>
Estani<br>
<br>
Am 14.12.2011 17:43, schrieb Kettleborough, Jamie:
<blockquote
cite="mid:E51EDFEBF10BE44BB4BDAF5FC2F024B90FB7F972@EXXMAIL02.desktop.frd.metoffice.com"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<meta name="GENERATOR" content="MSHTML 8.00.6001.19154">
<blockquote style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT:
5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px" dir="ltr">
<div>So, the way I picture this is:<br>
1) get the list of files to be downloaded (in the wget script
or by any other means)<br>
2) filter that to remove what is not required<br>
<span class="315445615-14122011"> </span></div>
</blockquote>
<span class="315445615-14122011">
<div dir="ltr" align="left"><font color="#0000ff" face="Arial"
size="2"><span class="315445615-14122011">This is basically
what we do MO - we create list of files to download, then
compare it with our local file system, and we filter out
any we already have. I think the replication system would
have to do this too wouldn't it? For what its worth I
think *every* user has their own version of the
replication problem - just the set of files they are
trying to replicate is different and they might be using a
different protocol to fetch the data.</span></font></div>
<div dir="ltr" align="left"><font color="#0000ff" face="Arial"
size="2"><span class="315445615-14122011"></span></font> </div>
<div dir="ltr" align="left"><font color="#0000ff" face="Arial"
size="2"><span class="315445615-14122011">If you accept this
way of working as valid/acceptable/encouraged then does it
have implications for the (scriptable) interfaces to
either P2P and or gateway 2? I think it means there
'should' be an interface that returns a list of files (not
wrapped in a wget script) and then maybe a service (either
client side or server side) that will take a list of urls
and generate the wget scripts. If you only have an
interface that returns wget scripts then users will have
to parse these to enable them to filter out the files they
already have copies of.</span></font></div>
<div> </div>
<div><span class="315445615-14122011"><font color="#0000ff"
face="Arial" size="2">Jamie</font></span></div>
<div><span class="315445615-14122011"></span> </div>
<div><span class="315445615-14122011"><font color="#0000ff"
face="Arial" size="2">(Sebastien - I'm aware this sort of
touches on a set of unanswered questions you asked a while
ago related to what we do at the MO... I've not forgotten
I want to answer this is more detail, apologies for being
so rubbish at answering so far). </font></span></div>
</span>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: <a class="moz-txt-link-abbreviated" href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</a> </pre>
</body>
</html>