<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>

<META name=GENERATOR content="MSHTML 8.00.6001.19154"></HEAD>

<BODY bgColor=#ffffff text=#000000>

<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff 

size=2 face=Arial>Hello Estani,</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff 

size=2 face=Arial>thanks for this information - very useful.&nbsp; One question 

then some follow up inline.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff 

size=2 face=Arial>How does the wget generator (or search) deal with replicas - 

what determines which replica the user will download or get returned from the 

search?&nbsp; </FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff 

size=2 face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=901251417-14122011><FONT color=#0000ff 

size=2 face=Arial>Jamie</FONT></SPAN></DIV><BR>

<BLOCKQUOTE 

style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px" 

dir=ltr>

  <DIV></DIV>

  <DIV><STRONG><FONT size=2 face=Tahoma></FONT></STRONG>Well, the P2P has the 

  /search which returns an xml response that the user could parse and the wget 

  script has the -w flag (also the current one) that outputs this list.<BR>The 

  way the wget script is designed, it's pretty simple to extract this list 

  anyhow as the whole wget script is not more than a "bash" decoration (with all 

  the intelligence). That's e.g. what the DML uses for ingesting the files to be 

  retrieved.<BR><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial>&nbsp;</FONT></SPAN></DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 face=Arial>this 

  is good to know.</FONT></SPAN></DIV>

  <DIV><SPAN class=901251417-14122011>&nbsp;</SPAN><BR>The replication system is 

  much more complicated, because you handle many more files at the same time, so 

  a simple script won't be able to manage 200.000 urls for replicating a couple 

  of experiments (3~4 urls per file in our case). Furthermore there are many 

  other requirements that the user don't have, including publication. But at the 

  very bottom there are many similarities indeed.<SPAN 

  class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial>&nbsp;</FONT></SPAN></DIV>

  <DIV><SPAN class=901251417-14122011></SPAN>&nbsp;</DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial>Happy to be corrected on my niave view of the replication problem - 

  though I still think it is useful to recognise what is common.&nbsp; Picking 

  up a couple of comments from your previous e-mail.&nbsp; How modular is the 

  replication system, and&nbsp;how much work would&nbsp;be involved in using 

  those modules that deal with that common stuff 'at the very bottom'&nbsp; to 

  write an 'intellegent' user client?</FONT>&nbsp;</SPAN><BR><BR>Regarding a 

  client wget script generator, it's the other way around how it works now. You 

  get the wget script and from it the list of files. It already checks for 

  downloaded files, so you don't need to do that and create a new wget script, 

  it will do it for you.<SPAN class=901251417-14122011><FONT color=#0000ff 

  size=2 face=Arial>&nbsp;</FONT></SPAN></DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial></FONT></SPAN>&nbsp;</DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial>Do</FONT></SPAN><SPAN class=901251417-14122011><FONT color=#0000ff 

  size=2 face=Arial>esn't this mean the wget script generator has to know the 

  directory structure the user is using for their local replica of the archive - 

  and this may differ (hey as we know its not even the same from node to 

  node).&nbsp; Your bash is way ahead of mine - so I could be wrong in what 

  follows, but from what *I* could tell from the sample wget script (generated 

  from an example on <A 

  href="http://www.esgf.org/wiki/ESGF_scripting">http://www.esgf.org/wiki/ESGF_scripting</A>) 

  it simply uses the file name assuming the file is in the local 

  directory.</FONT></SPAN></DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial></FONT></SPAN>&nbsp;</DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 face=Arial>Have 

  you considered copying to the DRS directory structure as the default&nbsp; - 

  this has a nice side effect of helping users know what version they have 

  downloaded. (Though you'd need to get version, and other drs elements missing 

  from the filename,&nbsp;into your script I think).&nbsp;&nbsp;&nbsp; I know my 

  suggestion would force people to use the DRS locally... but I don't know that 

  thats a *bad* thing.&nbsp; It would also have to either be run from the root 

  of their local copy, or have a compulsary&nbsp;argument&nbsp;that is the root 

  of their local copy.</FONT></SPAN></DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial></FONT></SPAN>&nbsp;</DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 face=Arial>The 

  wget script *could* then also get really clever and not do the remote copy of 

  data that has not changed from one version to the next (based on checksum), 

  but just put in a hard link or something like that.... thats probably quite a 

  way to getting an 'intellegent' user client?</FONT></SPAN></DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial></FONT></SPAN>&nbsp;</DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial>Having just written this I realised it&nbsp;doesn't really work for 

  us at the MO as our 'download server' - where we run wget - &nbsp;can not see 

  </FONT>&nbsp;<FONT color=#0000ff size=2 face=Arial>the&nbsp;disk where we keep 

  our local&nbsp;replicas of the cmip5 and tamip&nbsp;archives because of 

  security constraints.&nbsp; Thankfully the machines that do the list/search 

  before we fetch&nbsp;can see the local replica so we can filter the list 

  returned.&nbsp; It *may* work for others though?</FONT></SPAN><SPAN 

  class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial>&nbsp;</FONT></SPAN><SPAN 

  class=901251417-14122011>&nbsp;</SPAN><BR><BR>&nbsp;If a download manager or 

  any other "better" application communicates with the node, it should use the 

  xml semantic. If a simple "one-way" script is required, the wget is a good way 

  of getting it, dump the list (the overhead is really minimal) and process as 

  usual...<BR><BR>I think these two options are just fine, but correct me if you 

  don't share my view.<SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial>&nbsp;</FONT></SPAN></DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial></FONT></SPAN>&nbsp;</DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 face=Arial>I 

  think&nbsp;your options work&nbsp;- though I can't see how to get the checksum 

  from&nbsp;either the xml or the&nbsp;wget -w option (and I'm not sure the wget 

  -w ever would?).&nbsp; *BUT* that may be just because the sample datasets I'm 

  using don't make the checksums available?</FONT></SPAN></DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial></FONT></SPAN>&nbsp;</DIV>

  <DIV><SPAN class=901251417-14122011><FONT color=#0000ff size=2 

  face=Arial>Thanks again for your 

  reply.</FONT></SPAN><BR><BR>Thanks,<BR>Estani<BR>&nbsp;<BR>Am 14.12.2011 

  17:43, schrieb Kettleborough, Jamie: </DIV>

  <BLOCKQUOTE 

  cite=mid:E51EDFEBF10BE44BB4BDAF5FC2F024B90FB7F972@EXXMAIL02.desktop.frd.metoffice.com 

  type="cite">

    <META name=GENERATOR content="MSHTML 8.00.6001.19154">

    <BLOCKQUOTE 

    style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px" 

    dir=ltr>

      <DIV>So, the way I picture this is:<BR>1) get the list of files to be 

      downloaded (in the wget script or by any other means)<BR>2) filter that to 

      remove what is not required<BR><SPAN 

      class=315445615-14122011>&nbsp;</SPAN></DIV></BLOCKQUOTE><SPAN 

    class=315445615-14122011>

    <DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN 

    class=315445615-14122011>This is basically what we do MO - we create&nbsp; 

    list of files to download, then compare it with our local file system, and 

    we filter out any we already have.&nbsp; I think the replication system 

    would have to do this too wouldn't it?&nbsp;&nbsp;For what its worth&nbsp;I 

    think *every* user has their own version of the replication problem - just 

    the set of files they are trying to replicate is different and they might be 

    using a different protocol to fetch the data.</SPAN></FONT></DIV>

    <DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN 

    class=315445615-14122011></SPAN></FONT>&nbsp;</DIV>

    <DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN 

    class=315445615-14122011>If you accept this way of working as 

    valid/acceptable/encouraged then does it have implications for the 

    (scriptable) interfaces to either P2P and or gateway 2?&nbsp;&nbsp; I think 

    it means there 'should' be an interface that returns a list of files (not 

    wrapped in a wget script) and then maybe a service (either client side or 

    server side) that will take a list of urls and generate the wget 

    scripts.&nbsp; If you only have an interface that returns wget scripts then 

    users will have to parse these to enable them to filter out the files they 

    already have copies of.</SPAN></FONT></DIV>

    <DIV>&nbsp;</DIV>

    <DIV><SPAN class=315445615-14122011><FONT color=#0000ff size=2 

    face=Arial>Jamie</FONT></SPAN></DIV>

    <DIV><SPAN class=315445615-14122011></SPAN>&nbsp;</DIV>

    <DIV><SPAN class=315445615-14122011><FONT color=#0000ff size=2 

    face=Arial>(Sebastien - I'm aware this sort of touches on a set 

    of&nbsp;unanswered questions you asked a while ago&nbsp;related to what we 

    do&nbsp;at the MO... I've not forgotten I want to&nbsp;answer&nbsp;this is 

    more detail, apologies for being so rubbish at answering&nbsp;so 

    far).&nbsp;</FONT></SPAN></DIV></SPAN></BLOCKQUOTE><BR><BR><PRE class=moz-signature cols="72">-- 

Estanislao Gonzalez


Max-Planck-Institut f�r Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany


Phone:   +49 (40) 46 00 94-126

E-Mail:  <A class=moz-txt-link-abbreviated href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</A> </PRE></BLOCKQUOTE></BODY></HTML>