<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Well, the P2P has the /search which returns an xml response that the

    user could parse and the wget script has the -w flag (also the

    current one) that outputs this list.<br>

    The way the wget script is designed, it's pretty simple to extract

    this list anyhow as the whole wget script is not more than a "bash"

    decoration (with all the intelligence). That's e.g. what the DML

    uses for ingesting the files to be retrieved.<br>

    <br>

    The replication system is much more complicated, because you handle

    many more files at the same time, so a simple script won't be able

    to manage 200.000 urls for replicating a couple of experiments (3~4

    urls per file in our case). Furthermore there are many other

    requirements that the user don't have, including publication. But at

    the very bottom there are many similarities indeed.<br>

    <br>

    Basically, the wget script is not meant for moving terabytes of

    data, and not because of the protocol, but because the management of

    this is much to complicated.<br>

    <br>

    Regarding a client wget script generator, it's the other way around

    how it works now. You get the wget script and from it the list of

    files. It already checks for downloaded files, so you don't need to

    do that and create a new wget script, it will do it for you.<br>

    <br>

    &nbsp;If a download manager or any other "better" application

    communicates with the node, it should use the xml semantic. If a

    simple "one-way" script is required, the wget is a good way of

    getting it, dump the list (the overhead is really minimal) and

    process as usual...<br>

    <br>

    I think these two options are just fine, but correct me if you don't

    share my view.<br>

    <br>

    Thanks,<br>

    Estani<br>

    &nbsp;<br>

    Am 14.12.2011 17:43, schrieb Kettleborough, Jamie:

    <blockquote

cite="mid:E51EDFEBF10BE44BB4BDAF5FC2F024B90FB7F972@EXXMAIL02.desktop.frd.metoffice.com"

      type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      <meta name="GENERATOR" content="MSHTML 8.00.6001.19154">

      <blockquote style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT:

        5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px" dir="ltr">

        <div>So, the way I picture this is:<br>

          1) get the list of files to be downloaded (in the wget script

          or by any other means)<br>

          2) filter that to remove what is not required<br>

          <span class="315445615-14122011">&nbsp;</span></div>

      </blockquote>

      <span class="315445615-14122011">

        <div dir="ltr" align="left"><font color="#0000ff" face="Arial"

            size="2"><span class="315445615-14122011">This is basically

              what we do MO - we create&nbsp; list of files to download, then

              compare it with our local file system, and we filter out

              any we already have.&nbsp; I think the replication system would

              have to do this too wouldn't it?&nbsp;&nbsp;For what its worth&nbsp;I

              think *every* user has their own version of the

              replication problem - just the set of files they are

              trying to replicate is different and they might be using a

              different protocol to fetch the data.</span></font></div>

        <div dir="ltr" align="left"><font color="#0000ff" face="Arial"

            size="2"><span class="315445615-14122011"></span></font>&nbsp;</div>

        <div dir="ltr" align="left"><font color="#0000ff" face="Arial"

            size="2"><span class="315445615-14122011">If you accept this

              way of working as valid/acceptable/encouraged then does it

              have implications for the (scriptable) interfaces to

              either P2P and or gateway 2?&nbsp;&nbsp; I think it means there

              'should' be an interface that returns a list of files (not

              wrapped in a wget script) and then maybe a service (either

              client side or server side) that will take a list of urls

              and generate the wget scripts.&nbsp; If you only have an

              interface that returns wget scripts then users will have

              to parse these to enable them to filter out the files they

              already have copies of.</span></font></div>

        <div>&nbsp;</div>

        <div><span class="315445615-14122011"><font color="#0000ff"

              face="Arial" size="2">Jamie</font></span></div>

        <div><span class="315445615-14122011"></span>&nbsp;</div>

        <div><span class="315445615-14122011"><font color="#0000ff"

              face="Arial" size="2">(Sebastien - I'm aware this sort of

              touches on a set of&nbsp;unanswered questions you asked a while

              ago&nbsp;related to what we do&nbsp;at the MO... I've not forgotten

              I want to&nbsp;answer&nbsp;this is more detail, apologies for being

              so rubbish at answering&nbsp;so far).&nbsp;</font></span></div>

      </span>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Estanislao Gonzalez


Max-Planck-Institut f&uuml;r Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany


Phone:   +49 (40) 46 00 94-126

E-Mail:  <a class="moz-txt-link-abbreviated" href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</a> </pre>

  </body>

</html>