[Go-essp-tech] Wget data transfer rates

Estanislao Gonzalez gonzalez at dkrz.de
Tue Jul 5 02:53:05 MDT 2011


Hi,

if required you could also try our IPSL aqua4k replicated data published 
at albedo2.dkrz.de/esgcet gateway (you'll have to search for experiment 
= aqua4k). These are published via http and gridFTP. These are published 
under a local group which you'll be required to apply for.

I haven't checked the md5 checksums as the current gateways aren't 
serving them via the API. The 1.3 will, so I'm waiting for that (i.e. 
use this data as a test only).

I got them at ~300mbps/s from IPSL via 2 http connections (during the 
day the bandwidth was much lower though).

I assume you are all moving data for replication, right? Just waiting 
for the QC l2 assignment before publishing, right?
How do you harvest the checksum? And how will you proceed if some files 
(or the whole version) are removed before QC l2 assignment takes place?

Thanks,
Estani


Am 05.07.2011 07:32, schrieb martin.juckes at stfc.ac.uk:
> Hi Sébastien,
>
> thanks for those numbers, it looks as though we do have a network issue at BADC.
>
> In terms of per thread transfer rates, we get around 1MB/s from IPSL and CNRM -- which is close to what you get from us. For transfer rates from CCCMA, however, we only get around 0.1MB/s whereas you are actually get faster rates from Canada than from the UK.
>
> I also have no clue how to deal with data nodes that don't support PKI -- I'll contact Bob about that and see if we can get any progress,
>
> cheers,
> Martin
> ________________________________________
> From: Sébastien Denvil [sebastien.denvil at ipsl.jussieu.fr]
> Sent: 04 July 2011 14:56
> To: Juckes, Martin (STFC,RAL,RALSP)
> Cc: Luca.Cinquini at jpl.nasa.gov; go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Data node authorization
>
>    Hi Martin
>
>
> On 01/07/2011 11:13, martin.juckes at stfc.ac.uk wrote:
>> Hi Sebastien,
>>
>> How fast are your fast rates from CCCMA? I'm getting a few tens of GB/hour using up to 60 wget threads -- perhaps I should be using more (I got cautious about the numbers because at one point I had 40 threads finish in a short time and the machine was then frozen by 40 md5sum threads trying to run in parallel).
> Indeed it seems that we don't have the same standards to characterize a
> fast rate :-)
> we used 8 threads over a 1 Gbps network. CCCMA : 32 MB/s ; HADGEM : 7
> MB/s ; CNRM : 19 MB/s. The subsets was small at this stage (200 GB/per
> model) so we need a larger case to derive robust numbers.
>
>
>> Are you planning to do transfers from BCC using the tokenised scripts? I would prefer to use certificates, but they appear to support this at present,
> We will use the PKI approach for every downloads. Because I have no clue
> how I could generate a token in a programmatic way.
>
> Will post other numbers when the larger case will be done.
>
> Cheers.
> Sébastien
>
>> Cheers,
>> Martin
>>
>>>> -----Original Message-----
>>>> From: Sébastien Denvil [mailto:sebastien.denvil at ipsl.jussieu.fr]
>>>> Sent: 01 July 2011 09:33
>>>> To: Juckes, Martin (STFC,RAL,RALSP)
>>>> Cc: Luca.Cinquini at jpl.nasa.gov; jamie.kettleborough at metoffice.gov.uk;
>>>> go-essp-tech at ucar.edu
>>>> Subject: Re: [Go-essp-tech] Data node authorization
>>>>
>>>>    Hi all,
>>>>
>>>> we are in the process to download a subset of the already published
>>>> data
>>>> to sustain analysis activity (using multiple wget threads).
>>>>
>>>> We did that already for BADC, CNRM and CCCMA. Unlike Martin, we had
>>>> fast
>>>> transfert rate from CCCMA (network mysteries).
>>>>
>>>> I will let you know any interesting findings.
>>>>
>>>> The list we plan to download a subset from:
>>>>
>>>> bcc-csm1-1
>>>> CanCM4
>>>> CanESM2
>>>> CNRM-CM5
>>>> GISS-E2-H
>>>> GISS-E2-R
>>>> HadGEM2-A
>>>> HadGEM2-ES
>>>> inmcm4
>>>> bcc-csm1-1
>>>> NorESM1-M
>>>>
>>>> Regards.
>>>> Sébastien
>>>>
>>>> On 01/07/2011 09:55, martin.juckes at stfc.ac.uk wrote:
>>>>> Hi Jamie,
>>>>>
>>>>> As Luca says, the plan is to move to the myproxy system. Like the
>>>> wheels of justice, the wheels of ESGF grind exceedingly slow, but you
>>>> can't take the analogy much further. Like you, I've started to look at
>>>> data from other nodes. I've found that the myproxy system works for
>>>> BADC, IPSL, CNRM and CCCMA nodes. For the last two, you will get a
>>>> tokenised wget script if you go through the gateway (because they are
>>>> published through the PCMDI gateway) but it you build your own wget
>>>> scripts, you can use myproxy certificates. If you want data from
>>>> CCCMA, I find that the transatlantic transfer is very slow and you may
>>>> want to copy what I already have at BADC -- let me know if you do as
>>>> this copy won't be available through the BADC gateway until QC L2 has
>>>> been completed, which may be some way off.
>>>>> Another issue is the publication of checksums, because there is a
>>>> significant risk of data corruption when moving large volumes. BADC
>>>> and IPSL have the checksums in the THREDDS catalogues, CNRM is in the
>>>> process of re-publishing to achieve this. I'm going to get in touch
>>>> with CCCMA today to encourage them to do the same,
>>>>> Regards,
>>>>> Martin
>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
>>>>>>> bounces at ucar.edu] On Behalf Of Cinquini, Luca (3880)
>>>>>>> Sent: 30 June 2011 20:10
>>>>>>> To: Kettleborough, Jamie
>>>>>>> Cc: go-essp-tech at ucar.edu
>>>>>>> Subject: Re: [Go-essp-tech] Data node authorization
>>>>>>>
>>>>>>> Hi Jamie,
>>>>>>>   the established plan is to move all sites to the MyProxy (SAML-
>>>>>>> based) authentication and authorization system, and to gradually
>>>>>>> phased out the token
>>>>>>> based system. I know JPL and BADC have already moved, and that
>>>> PCMDI
>>>>>>> is still on the token based system. AS for the timeline at each
>>>> site,
>>>>>>> the corresponding
>>>>>>> administrators will have to chime in.
>>>>>>> thanks, Luca
>>>>>>>
>>>>>>> On Jun 30, 2011, at 7:22 AM, Kettleborough, Jamie wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Earlier this week I was trying to get data from different data
>>>>>>> nodes.
>>>>>>>> There seemed to be two authorization methods in place - one based
>>>> on
>>>>>>>> MyProxy, the other based on a token in the HTTP query string.
>>>>>>>>
>>>>>>>> Is this the long term plan?
>>>>>>>> If not then how soon will just one method be supported across all
>>>>>>> nodes?
>>>>>>>> If it is then I guess there will be follow up questions about how
>>>> to
>>>>>>>> handle both...
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Jamie
>>>>>>>> _______________________________________________
>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>> _______________________________________________
>>>>>>> GO-ESSP-TECH mailing list
>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>> --
>>>> Sébastien Denvil
>>>> IPSL, Pôle de modélisation du climat
>>>> UPMC, Case 101, 4 place Jussieu,
>>>> 75252 Paris Cedex 5
>>>>
>>>> Tour 45-55 2ème étage Bureau 209
>>>> Tel: 33 1 44 27 21 10
>>>> Fax: 33 1 44 27 39 02
>>>>
> --
> Sébastien Denvil
> IPSL, Pôle de modélisation du climat
> UPMC, Case 101, 4 place Jussieu,
> 75252 Paris Cedex 5
>
> Tour 45-55 2ème étage Bureau 209
> Tel: 33 1 44 27 21 10
> Fax: 33 1 44 27 39 02
>
>


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de



More information about the GO-ESSP-TECH mailing list