[Go-essp-tech] Progress and Problems with P2P

Jennifer Adams jma at cola.iges.org
Wed Feb 1 09:08:22 MST 2012


Hi, Everyone -- 
I have been working with Luca and Gavin and Estani on testing the P2P system. I'm happy to report that the many significant parts of my workflow for downloading data can now be fully automated:
1. Search for available data sets that meet my desired requirements (e.g. decadal1980/atmos/mon/Amon, all models, all members, selected variables)
2. Compare search results to a list of what I've already got 
3. Build script to download and run wget scripts for datasets I still need
All this using shell scripts and without touching a browser! 

There are still a few wrinkles, however. The worst of these is that not all data nodes are authenticating certificates properly. 

These data nodes are working:
bmbf-ipcc-ar5.dkrz.de
cmip-dn.badc.rl.ac.uk
dias-esg-nd.tkl.iis.u-tokyo.ac.jp
esgdata.gfdl.noaa.gov
esg.cnrm-game-meteo.fr
norstore-trd-bio1.hpc.ntnu.no
pcmdi11.llnl.gov
vesg.ipsl.fr

These data nodes are not:
dap.cccma.uvic.ca
esg.nccs.nasa.gov
bcccsm.cma.gov.cn
tds.ucar.edu
pcmdi9.llnl.gov
esg-datanode.jpl.nasa.gov

P2P wget scripts to download data from the second set of data nodes always fail completely (for me, a "pure client" user). I know that Luca and Gavin and Estani are working to fix these problems, but here's some encouragement to the data node administrators to help them resolve this issue. Wgets that rely on the quickly-expiring authorization tokens ought to be deprecated as soon as possible -- they are the biggest, most irritating source of errors in the whole system (my opinion only).

The second problem is that the wget scripts don't often succeed in getting all the files they are configured to grab. Of the 64 scripts that I ran this morning, 55 were incomplete (that's 86%). The errors in log files are all "ERROR 403: Forbidden." I was running the 64 scripts at one time, but each one was grabbing its list of files in order -- I am not parallelizing the wgets in the way that some other users have described. When I rerun the wget scripts from the incomplete runs, some of them finish the job on the 2nd try, others will take as many as 8 tries before all the files are in. 

These are the data nodes that are not allowing me to complete the download (for the decadal1980 example mentioned above). 
esg.cnrm-game-meteo.fr
vesg.ipsl.fr
dias-esg-nd.tkl.iis.u-tokyo.ac.jp
cmip-dn.badc.rl.ac.uk
bmbf-ipcc-ar5.dkrz.de

I suspect there are some throttling settings in place to limit the number of wgets that a data node will allow at any particular time. I think my use of the data nodes is reasonable, and these throttles are set too high. The dreaded "Forbidden" errors may be related to another problem, but are still a nuisance no matter what the reason. YAO (Yet Another Obstacle). Please put this on the high priority list of things to fix, right behind the certificate authentication issue. 

Respectfully submitted,
Jennifer






More information about the GO-ESSP-TECH mailing list