[Go-essp-tech] Progress and Problems with P2P

Cinquini, Luca (3880) Luca.Cinquini at jpl.nasa.gov
Thu Feb 2 06:59:55 MST 2012


Hi Stephane,
	are you saying you cannot display the XML in the browser, or that the XML is hard to read ?

If the first, both Firefox and the latest version of Safari should be able to display the XML... other browser might too. You can always try to do "View Source" to show the XML.

If instead you are saying the XML is not meant for human friendly, well you are definitely right - but the whole point of the tutorial was to show how the whole download procedure can be scripted. In a real world scenario, you would write a small program to parse the XML, or you would scan the XML by hand and use some of the values at the end to construct the queries to generate the wget scripts. This use case is not really meant for first time users of the system, but for power users who want to get to the data as efficiently and scalably as possible.

Please let me know if this answers your question,

thanks, Luca


On Feb 2, 2012, at 6:09 AM, Stéphane Senesi wrote:

> Luca
> 
> Cinquini, Luca (3880) wrote, On 02/02/2012 00:26:
>> Hi Jennifer, Sebastian,
>> 	first of all thanks to Jennifer for trying out the p2p system, and reporting positive feedback on the search can wget generation.
>> 
>> Jennifer, your use case is so common to climate scientists that I wrote a tutorial that others can follow to gain from your experience:
>> 
>> http://www.esgf.org/wiki/ESGF_Data_Download_Strategies
>> 
> 
> This page is much interesting. However, as a standard user, I am unable 
> to get with my browser (Firefox) a proper display of the answsers to the 
> first requests, which are in form of xml documents and are displayed as 
> such, without any user-friendly formatting, and with this sole warning : 
> "This XML file does not appear to have any style information associated 
> with it. The document tree is shown below."  I was not able to get any 
> other format than the default one.
> 
> I may have missed a step ?
> 
> (Actually, my remark applies first to page 
> http://esgf.org/wiki/ESGF_Search_API )
> 
> Regards
> 
> S
> 
>> Indeed we need to work with all data nodes in the coming days (weeks?) to make the wget access completely pain-free. This would involve:
>> 
>> o Move any lingering data nodes away from the token based system and to the PXI-based system
>> 
>> o Upgrade all datanodes to the latest software, so that they can trust all ESGF certificates, and execute authorization versus multiple servers
>> 
>> o Investigate any throttling issues.
>> 
>> I believe we need to tackle one data node at a time....
>> 
>> thanks again,
>> Luca
>> 
>> 
>> On Feb 1, 2012, at 2:20 PM, Sébastien Denvil wrote:
>> 
>> 
>>> Hi Jennifer, all
>>> 
>>> see below:
>>> 
>>> Le 01/02/2012 17:08, Jennifer Adams a écrit :
>>> 
>>>> Hi, Everyone --
>>>> I have been working with Luca and Gavin and Estani on testing the P2P system. I'm happy to report that the many significant parts of my workflow for downloading data can now be fully automated:
>>>> 1. Search for available data sets that meet my desired requirements (e.g. decadal1980/atmos/mon/Amon, all models, all members, selected variables)
>>>> 2. Compare search results to a list of what I've already got
>>>> 3. Build script to download and run wget scripts for datasets I still need
>>>> All this using shell scripts and without touching a browser!
>>>> 
>>>> There are still a few wrinkles, however. The worst of these is that not all data nodes are authenticating certificates properly.
>>>> 
>>>> These data nodes are working:
>>>> bmbf-ipcc-ar5.dkrz.de
>>>> cmip-dn.badc.rl.ac.uk
>>>> dias-esg-nd.tkl.iis.u-tokyo.ac.jp
>>>> esgdata.gfdl.noaa.gov
>>>> esg.cnrm-game-meteo.fr
>>>> norstore-trd-bio1.hpc.ntnu.no
>>>> pcmdi11.llnl.gov
>>>> vesg.ipsl.fr
>>>> 
>>> To help you diagnose the source of your PKI issues :
>>> 
>>> 
>>>> These data nodes are not:
>>>> dap.cccma.uvic.ca
>>>> 
>>> It works for us as I write. PCMDI openid if it matters.
>>> 
>>> 
>>>> esg.nccs.nasa.gov
>>>> 
>>> We have never been able to download from there using PKI.
>>> 
>>> 
>>>> bcccsm.cma.gov.cn
>>>> 
>>> We have been able to download from there using PKI recently.
>>> 
>>> 
>>>> tds.ucar.edu
>>>> 
>>> We have been able to download from there using PKI recently.
>>> 
>>> 
>>>> pcmdi9.llnl.gov
>>>> 
>>> We used pcmdi3 up to know (it worked), will give a try to pcmdi9
>>> 
>>> 
>>>> esg-datanode.jpl.nasa.gov
>>>> 
>>> Did not tried yet.
>>> 
>>> 
>>>> P2P wget scripts to download data from the second set of data nodes always fail completely (for me, a "pure client" user). I know that Luca and Gavin and Estani are working to fix these problems, but here's some encouragement to the data node administrators to help them resolve this issue. Wgets that rely on the quickly-expiring authorization tokens ought to be deprecated as soon as possible -- they are the biggest, most irritating source of errors in the whole system (my opinion only).
>>>> 
>>>> The second problem is that the wget scripts don't often succeed in getting all the files they are configured to grab. Of the 64 scripts that I ran this morning, 55 were incomplete (that's 86%). The errors in log files are all "ERROR 403: Forbidden." I was running the 64 scripts at one time, but each one was grabbing its list of files in order -- I am not parallelizing the wgets in the way that some other users have described. When I rerun the wget scripts from the incomplete runs, some of them finish the job on the 2nd try, others will take as many as 8 tries before all the files are in.
>>>> 
>>>> These are the data nodes that are not allowing me to complete the download (for the decadal1980 example mentioned above).
>>>> esg.cnrm-game-meteo.fr
>>>> vesg.ipsl.fr
>>>> dias-esg-nd.tkl.iis.u-tokyo.ac.jp
>>>> cmip-dn.badc.rl.ac.uk
>>>> bmbf-ipcc-ar5.dkrz.de
>>>> 
>>> I notice that those node seems to be far away from your home location.
>>> May I suggest you to give those options to wget ; from our experiences
>>> with synchro-data they appear to improve substantially the success rate :
>>> wget --timeout=20 --tries=10
>>> 
>>> A lot has been said about this issue. Difficult to attribute properly
>>> the responsibility of those failure to such or such components/parties.
>>> Give a try to the parameter above.
>>> 
>>> If you send me the IP adress you used we could investigate the
>>> vesg.ipsl.fr case.
>>> 
>>> 
>>>> I suspect there are some throttling settings in place to limit the number of wgets that a data node will allow at any particular time. I think my use of the data nodes is reasonable, and these throttles are set too high.
>>>> 
>>> It's quiet hard to set this up properly using a tomcat free version.
>>> Some commercial tomcat version offers easiest way to achieve that. Up to
>>> know (on our node : vesg.ipsl.fr) we haven't done anything specific
>>> regarding that.
>>> 
>>> regards.
>>> Sébastien
>>> 
>>>> The dreaded "Forbidden" errors may be related to another problem, but are still a nuisance no matter what the reason. YAO (Yet Another Obstacle). Please put this on the high priority list of things to fix, right behind the certificate authentication issue.
>>>> 
>>>> Respectfully submitted,
>>>> Jennifer
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>> 
>>> 
>>> -- 
>>> Sébastien Denvil
>>> IPSL, Pôle de modélisation du climat
>>> UPMC, Case 101, 4 place Jussieu,
>>> 75252 Paris Cedex 5
>>> 
>>> Tour 45-55 2ème étage Bureau 209
>>> Tel: 33 1 44 27 21 10
>>> Fax: 33 1 44 27 39 02
>>> 
>>> 
>>> <smime.p7s>_______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>> 
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> 
>> 
> 
> 
> -- 
> Stéphane Sénési
> Ingénieur - équipe Assemblage du Système Terre
> Centre National de Recherches Météorologiques
> Groupe de Météorologie à Grande Echelle et Climat
> 
> CNRM/GMGEC/ASTER
> 42 Av Coriolis
> F-31057 Toulouse Cedex 1
> 
> +33.5.61.07.99.31 (Fax :....9610)
> 



More information about the GO-ESSP-TECH mailing list