[Go-essp-tech] Status of Gateway 2.0 (another use case)

Estanislao Gonzalez gonzalez at dkrz.de
Wed Dec 14 10:01:34 MST 2011


Well, the P2P has the /search which returns an xml response that the 
user could parse and the wget script has the -w flag (also the current 
one) that outputs this list.
The way the wget script is designed, it's pretty simple to extract this 
list anyhow as the whole wget script is not more than a "bash" 
decoration (with all the intelligence). That's e.g. what the DML uses 
for ingesting the files to be retrieved.

The replication system is much more complicated, because you handle many 
more files at the same time, so a simple script won't be able to manage 
200.000 urls for replicating a couple of experiments (3~4 urls per file 
in our case). Furthermore there are many other requirements that the 
user don't have, including publication. But at the very bottom there are 
many similarities indeed.

Basically, the wget script is not meant for moving terabytes of data, 
and not because of the protocol, but because the management of this is 
much to complicated.

Regarding a client wget script generator, it's the other way around how 
it works now. You get the wget script and from it the list of files. It 
already checks for downloaded files, so you don't need to do that and 
create a new wget script, it will do it for you.

  If a download manager or any other "better" application communicates 
with the node, it should use the xml semantic. If a simple "one-way" 
script is required, the wget is a good way of getting it, dump the list 
(the overhead is really minimal) and process as usual...

I think these two options are just fine, but correct me if you don't 
share my view.

Thanks,
Estani

Am 14.12.2011 17:43, schrieb Kettleborough, Jamie:
>
>     So, the way I picture this is:
>     1) get the list of files to be downloaded (in the wget script or
>     by any other means)
>     2) filter that to remove what is not required
>
> This is basically what we do MO - we create  list of files to 
> download, then compare it with our local file system, and we filter 
> out any we already have.  I think the replication system would have to 
> do this too wouldn't it?  For what its worth I think *every* user has 
> their own version of the replication problem - just the set of files 
> they are trying to replicate is different and they might be using a 
> different protocol to fetch the data.
> If you accept this way of working as valid/acceptable/encouraged then 
> does it have implications for the (scriptable) interfaces to either 
> P2P and or gateway 2?   I think it means there 'should' be an 
> interface that returns a list of files (not wrapped in a wget script) 
> and then maybe a service (either client side or server side) that will 
> take a list of urls and generate the wget scripts.  If you only have 
> an interface that returns wget scripts then users will have to parse 
> these to enable them to filter out the files they already have copies of.
> Jamie
> (Sebastien - I'm aware this sort of touches on a set of unanswered 
> questions you asked a while ago related to what we do at the MO... 
> I've not forgotten I want to answer this is more detail, apologies for 
> being so rubbish at answering so far).


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20111214/92b97123/attachment.html 


More information about the GO-ESSP-TECH mailing list