<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi Jamie,<br>

    <br>

    to be honest replicas are not fully supported. There are too many

    cave-eats at the moment and I'm waiting for Eric green light to

    completely remove them from our end.<br>

    <br>

    But the idea up to now was to let the user decide which replica was

    going to be downloaded, just the same way the user decides which

    dataset it is (see the NCC datasets that are fully replicated at our

    end). <br>

    <br>

    I honestly doubt this is what the user needs, although it might be

    what they want. I would presume the client tool would better take

    care of those decisions and even parallelizing downloads from

    multiple ends. The wget script will not handle all this complexity.<br>

    <br>

    So the user will just select something, and whatever is selected

    will be wrapped by the wget script. there will be no concept of

    replica at this stage.<br>

    <br>

    Hope this answers your question.<br>

    <br>

    Regards,<br>

    Estani<br>

    <br>

    Am 14.12.2011 19:33, schrieb Kettleborough, Jamie:

    <blockquote

cite="mid:E51EDFEBF10BE44BB4BDAF5FC2F024B90FB7F975@EXXMAIL02.desktop.frd.metoffice.com"

      type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      <meta name="GENERATOR" content="MSHTML 8.00.6001.19154">

      <div dir="ltr" align="left"><span class="901251417-14122011"><font

            color="#0000ff" face="Arial" size="2">Hello Estani,</font></span></div>

      <div dir="ltr" align="left"><span class="901251417-14122011"></span>&nbsp;</div>

      <div dir="ltr" align="left"><span class="901251417-14122011"><font

            color="#0000ff" face="Arial" size="2">thanks for this

            information - very useful.&nbsp; One question then some follow up

            inline.</font></span></div>

      <div dir="ltr" align="left"><span class="901251417-14122011"></span>&nbsp;</div>

      <div dir="ltr" align="left"><span class="901251417-14122011"><font

            color="#0000ff" face="Arial" size="2">How does the wget

            generator (or search) deal with replicas - what determines

            which replica the user will download or get returned from

            the search?&nbsp; </font></span></div>

      <div dir="ltr" align="left"><span class="901251417-14122011"></span>&nbsp;</div>

      <div dir="ltr" align="left"><span class="901251417-14122011"><font

            color="#0000ff" face="Arial" size="2">Jamie</font></span></div>

      <br>

      <blockquote style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT:

        5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px" dir="ltr">

        <div><strong></strong>Well, the P2P has the /search which

          returns an xml response that the user could parse and the wget

          script has the -w flag (also the current one) that outputs

          this list.<br>

          The way the wget script is designed, it's pretty simple to

          extract this list anyhow as the whole wget script is not more

          than a "bash" decoration (with all the intelligence). That's

          e.g. what the DML uses for ingesting the files to be

          retrieved.<br>

          <span class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">&nbsp;</font></span></div>

        <div><span class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">this is good to know.</font></span></div>

        <div><span class="901251417-14122011">&nbsp;</span><br>

          The replication system is much more complicated, because you

          handle many more files at the same time, so a simple script

          won't be able to manage 200.000 urls for replicating a couple

          of experiments (3~4 urls per file in our case). Furthermore

          there are many other requirements that the user don't have,

          including publication. But at the very bottom there are many

          similarities indeed.<span class="901251417-14122011"><font

              color="#0000ff" face="Arial" size="2">&nbsp;</font></span></div>

        <div><span class="901251417-14122011"></span>&nbsp;</div>

        <div><span class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">Happy to be corrected on my niave

              view of the replication problem - though I still think it

              is useful to recognise what is common.&nbsp; Picking up a

              couple of comments from your previous e-mail.&nbsp; How modular

              is the replication system, and&nbsp;how much work would&nbsp;be

              involved in using those modules that deal with that common

              stuff 'at the very bottom'&nbsp; to write an 'intellegent' user

              client?</font>&nbsp;</span><br>

          <br>

          Regarding a client wget script generator, it's the other way

          around how it works now. You get the wget script and from it

          the list of files. It already checks for downloaded files, so

          you don't need to do that and create a new wget script, it

          will do it for you.<span class="901251417-14122011"><font

              color="#0000ff" face="Arial" size="2">&nbsp;</font></span></div>

        <div><span class="901251417-14122011"></span>&nbsp;</div>

        <div><span class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">Do</font></span><span

            class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">esn't this mean the wget script

              generator has to know the directory structure the user is

              using for their local replica of the archive - and this

              may differ (hey as we know its not even the same from node

              to node).&nbsp; Your bash is way ahead of mine - so I could be

              wrong in what follows, but from what *I* could tell from

              the sample wget script (generated from an example on <a

                moz-do-not-send="true"

                href="http://www.esgf.org/wiki/ESGF_scripting">http://www.esgf.org/wiki/ESGF_scripting</a>)

              it simply uses the file name assuming the file is in the

              local directory.</font></span></div>

        <div><span class="901251417-14122011"></span>&nbsp;</div>

        <div><span class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">Have you considered copying to the

              DRS directory structure as the default&nbsp; - this has a nice

              side effect of helping users know what version they have

              downloaded. (Though you'd need to get version, and other

              drs elements missing from the filename,&nbsp;into your script I

              think).&nbsp;&nbsp;&nbsp; I know my suggestion would force people to use

              the DRS locally... but I don't know that thats a *bad*

              thing.&nbsp; It would also have to either be run from the root

              of their local copy, or have a compulsary&nbsp;argument&nbsp;that is

              the root of their local copy.</font></span></div>

        <div><span class="901251417-14122011"></span>&nbsp;</div>

        <div><span class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">The wget script *could* then also

              get really clever and not do the remote copy of data that

              has not changed from one version to the next (based on

              checksum), but just put in a hard link or something like

              that.... thats probably quite a way to getting an

              'intellegent' user client?</font></span></div>

        <div><span class="901251417-14122011"></span>&nbsp;</div>

        <div><span class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">Having just written this I realised

              it&nbsp;doesn't really work for us at the MO as our 'download

              server' - where we run wget - &nbsp;can not see </font>&nbsp;<font

              color="#0000ff" face="Arial" size="2">the&nbsp;disk where we

              keep our local&nbsp;replicas of the cmip5 and tamip&nbsp;archives

              because of security constraints.&nbsp; Thankfully the machines

              that do the list/search before we fetch&nbsp;can see the local

              replica so we can filter the list returned.&nbsp; It *may* work

              for others though?</font></span><span

            class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">&nbsp;</font></span><span

            class="901251417-14122011">&nbsp;</span><br>

          <br>

          &nbsp;If a download manager or any other "better" application

          communicates with the node, it should use the xml semantic. If

          a simple "one-way" script is required, the wget is a good way

          of getting it, dump the list (the overhead is really minimal)

          and process as usual...<br>

          <br>

          I think these two options are just fine, but correct me if you

          don't share my view.<span class="901251417-14122011"><font

              color="#0000ff" face="Arial" size="2">&nbsp;</font></span></div>

        <div><span class="901251417-14122011"></span>&nbsp;</div>

        <div><span class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">I think&nbsp;your options work&nbsp;- though I

              can't see how to get the checksum from&nbsp;either the xml or

              the&nbsp;wget -w option (and I'm not sure the wget -w ever

              would?).&nbsp; *BUT* that may be just because the sample

              datasets I'm using don't make the checksums available?</font></span></div>

        <div><span class="901251417-14122011"></span>&nbsp;</div>

        <div><span class="901251417-14122011"><font color="#0000ff"

              face="Arial" size="2">Thanks again for your reply.</font></span><br>

          <br>

          Thanks,<br>

          Estani<br>

          &nbsp;<br>

          Am 14.12.2011 17:43, schrieb Kettleborough, Jamie: </div>

        <blockquote

cite="mid:E51EDFEBF10BE44BB4BDAF5FC2F024B90FB7F972@EXXMAIL02.desktop.frd.metoffice.com"

          type="cite">

          <meta name="GENERATOR" content="MSHTML 8.00.6001.19154">

          <blockquote style="BORDER-LEFT: #0000ff 2px solid;

            PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px"

            dir="ltr">

            <div>So, the way I picture this is:<br>

              1) get the list of files to be downloaded (in the wget

              script or by any other means)<br>

              2) filter that to remove what is not required<br>

              <span class="315445615-14122011">&nbsp;</span></div>

          </blockquote>

          <span class="315445615-14122011">

            <div dir="ltr" align="left"><font color="#0000ff"

                face="Arial" size="2"><span class="315445615-14122011">This

                  is basically what we do MO - we create&nbsp; list of files

                  to download, then compare it with our local file

                  system, and we filter out any we already have.&nbsp; I

                  think the replication system would have to do this too

                  wouldn't it?&nbsp;&nbsp;For what its worth&nbsp;I think *every* user

                  has their own version of the replication problem -

                  just the set of files they are trying to replicate is

                  different and they might be using a different protocol

                  to fetch the data.</span></font></div>

            <div dir="ltr" align="left"><font color="#0000ff"

                face="Arial" size="2"><span class="315445615-14122011"></span></font>&nbsp;</div>

            <div dir="ltr" align="left"><font color="#0000ff"

                face="Arial" size="2"><span class="315445615-14122011">If

                  you accept this way of working as

                  valid/acceptable/encouraged then does it have

                  implications for the (scriptable) interfaces to either

                  P2P and or gateway 2?&nbsp;&nbsp; I think it means there

                  'should' be an interface that returns a list of files

                  (not wrapped in a wget script) and then maybe a

                  service (either client side or server side) that will

                  take a list of urls and generate the wget scripts.&nbsp; If

                  you only have an interface that returns wget scripts

                  then users will have to parse these to enable them to

                  filter out the files they already have copies of.</span></font></div>

            <div>&nbsp;</div>

            <div><span class="315445615-14122011"><font color="#0000ff"

                  face="Arial" size="2">Jamie</font></span></div>

            <div><span class="315445615-14122011"></span>&nbsp;</div>

            <div><span class="315445615-14122011"><font color="#0000ff"

                  face="Arial" size="2">(Sebastien - I'm aware this sort

                  of touches on a set of&nbsp;unanswered questions you asked

                  a while ago&nbsp;related to what we do&nbsp;at the MO... I've

                  not forgotten I want to&nbsp;answer&nbsp;this is more detail,

                  apologies for being so rubbish at answering&nbsp;so far).&nbsp;</font></span></div>

          </span></blockquote>

        <br>

        <br>

        <pre class="moz-signature" cols="72">-- 

Estanislao Gonzalez


Max-Planck-Institut f&uuml;r Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany


Phone:   +49 (40) 46 00 94-126

E-Mail:  <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</a> </pre>

      </blockquote>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Estanislao Gonzalez


Max-Planck-Institut f&uuml;r Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany


Phone:   +49 (40) 46 00 94-126

E-Mail:  <a class="moz-txt-link-abbreviated" href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</a> </pre>

  </body>

</html>