[Go-essp-tech] Data node authorization

Estanislao Gonzalez gonzalez at dkrz.de
Fri Jul 1 04:56:12 MDT 2011


Hi Martin,

I'm not sure hows for your case, but I think 60 threads are too many, 
are they truly bringing any performance increase at all? If the 
performance is minimal you might be using more CPU that could be used 
for md5sums that are IO and CPU intensive.

What I do is to separate the download from the checksums so they run 
separately. Haven't tried CCCMA and for IPSL I'm limited to a couple of 
sockets, as if I tried to open more than 3 or 4, they got immediately 
closed. I've got up to now good performance with ~9 threads and 1 or 1.5 
threads per cpu core for checksumming.
But  I guess that's very architecture dependence.

I'll kindly ask to store at least some benchmarks about the connections 
so that we could later draw a map about the measured speeds between data 
nodes and gateways and hopefully improved the total bandwidth.

Thanks,
Estani

Am 01.07.2011 11:13, schrieb martin.juckes at stfc.ac.uk:
> Hi Sebastien,
>
> How fast are your fast rates from CCCMA? I'm getting a few tens of GB/hour using up to 60 wget threads -- perhaps I should be using more (I got cautious about the numbers because at one point I had 40 threads finish in a short time and the machine was then frozen by 40 md5sum threads trying to run in parallel).
>
> Are you planning to do transfers from BCC using the tokenised scripts? I would prefer to use certificates, but they appear to support this at present,
>
> Cheers,
> Martin
>
>>> -----Original Message-----
>>> From: Sébastien Denvil [mailto:sebastien.denvil at ipsl.jussieu.fr]
>>> Sent: 01 July 2011 09:33
>>> To: Juckes, Martin (STFC,RAL,RALSP)
>>> Cc: Luca.Cinquini at jpl.nasa.gov; jamie.kettleborough at metoffice.gov.uk;
>>> go-essp-tech at ucar.edu
>>> Subject: Re: [Go-essp-tech] Data node authorization
>>>
>>>   Hi all,
>>>
>>> we are in the process to download a subset of the already published
>>> data
>>> to sustain analysis activity (using multiple wget threads).
>>>
>>> We did that already for BADC, CNRM and CCCMA. Unlike Martin, we had
>>> fast
>>> transfert rate from CCCMA (network mysteries).
>>>
>>> I will let you know any interesting findings.
>>>
>>> The list we plan to download a subset from:
>>>
>>> bcc-csm1-1
>>> CanCM4
>>> CanESM2
>>> CNRM-CM5
>>> GISS-E2-H
>>> GISS-E2-R
>>> HadGEM2-A
>>> HadGEM2-ES
>>> inmcm4
>>> bcc-csm1-1
>>> NorESM1-M
>>>
>>> Regards.
>>> Sébastien
>>>
>>> On 01/07/2011 09:55, martin.juckes at stfc.ac.uk wrote:
>>>> Hi Jamie,
>>>>
>>>> As Luca says, the plan is to move to the myproxy system. Like the
>>> wheels of justice, the wheels of ESGF grind exceedingly slow, but you
>>> can't take the analogy much further. Like you, I've started to look at
>>> data from other nodes. I've found that the myproxy system works for
>>> BADC, IPSL, CNRM and CCCMA nodes. For the last two, you will get a
>>> tokenised wget script if you go through the gateway (because they are
>>> published through the PCMDI gateway) but it you build your own wget
>>> scripts, you can use myproxy certificates. If you want data from
>>> CCCMA, I find that the transatlantic transfer is very slow and you may
>>> want to copy what I already have at BADC -- let me know if you do as
>>> this copy won't be available through the BADC gateway until QC L2 has
>>> been completed, which may be some way off.
>>>> Another issue is the publication of checksums, because there is a
>>> significant risk of data corruption when moving large volumes. BADC
>>> and IPSL have the checksums in the THREDDS catalogues, CNRM is in the
>>> process of re-publishing to achieve this. I'm going to get in touch
>>> with CCCMA today to encourage them to do the same,
>>>> Regards,
>>>> Martin
>>>>
>>>>>> -----Original Message-----
>>>>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
>>>>>> bounces at ucar.edu] On Behalf Of Cinquini, Luca (3880)
>>>>>> Sent: 30 June 2011 20:10
>>>>>> To: Kettleborough, Jamie
>>>>>> Cc: go-essp-tech at ucar.edu
>>>>>> Subject: Re: [Go-essp-tech] Data node authorization
>>>>>>
>>>>>> Hi Jamie,
>>>>>> 	the established plan is to move all sites to the MyProxy (SAML-
>>>>>> based) authentication and authorization system, and to gradually
>>>>>> phased out the token
>>>>>> based system. I know JPL and BADC have already moved, and that
>>> PCMDI
>>>>>> is still on the token based system. AS for the timeline at each
>>> site,
>>>>>> the corresponding
>>>>>> administrators will have to chime in.
>>>>>> thanks, Luca
>>>>>>
>>>>>> On Jun 30, 2011, at 7:22 AM, Kettleborough, Jamie wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Earlier this week I was trying to get data from different data
>>>>>> nodes.
>>>>>>> There seemed to be two authorization methods in place - one based
>>> on
>>>>>>> MyProxy, the other based on a token in the HTTP query string.
>>>>>>>
>>>>>>> Is this the long term plan?
>>>>>>> If not then how soon will just one method be supported across all
>>>>>> nodes?
>>>>>>> If it is then I guess there will be follow up questions about how
>>> to
>>>>>>> handle both...
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Jamie
>>>>>>> _______________________________________________
>>>>>>> GO-ESSP-TECH mailing list
>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>> _______________________________________________
>>>>>> GO-ESSP-TECH mailing list
>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>
>>> --
>>> Sébastien Denvil
>>> IPSL, Pôle de modélisation du climat
>>> UPMC, Case 101, 4 place Jussieu,
>>> 75252 Paris Cedex 5
>>>
>>> Tour 45-55 2ème étage Bureau 209
>>> Tel: 33 1 44 27 21 10
>>> Fax: 33 1 44 27 39 02
>>>


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de



More information about the GO-ESSP-TECH mailing list