[Go-essp-tech] download rates

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Fri Mar 4 02:40:44 MST 2011


Hi Mehmet,

This data is not fully published -- i.e. it isn't published to a gateway.  Therefore when an authorisation request is made to download the file the gateway denies access.

Cheers,
Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK

From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Mehmet Balman
Sent: 03 March 2011 22:25
To: Estanislao Gonzalez
Cc: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] download rates

Hi Estani,

I also see some problems while trying to replicate cmip5 data from BADC. Some files gave authorization problems (403), some are listed in thredds but they do not exist.

For example, can you access this file?
http://cmip-dn.badc.rl.ac.uk/thredds/fileServer/esg_dataroot/cmip5/output1/MOHC/HadGEM2-ES/rcp85/day/atmos/day/r1i1p1/v20110127/prc/prc_day_HadGEM2-ES_rcp85_r1i1p1_20301201-20351130.nc

I just parse everything from the thredds catalog. Here is how I am getting files (see the script below for 12 streams). How do you use wget?

If anybody wants to try gridftp downloads, nersc has cmip3 data published with gridftp and http urls.


wget -nv http://cmip-dn.badc.rl.ac.uk/thredds/esgcet/catalog.html -O- | grep output1 | awk 'BEGIN{FS="="}{print $4}'| sed  "s/'/ /g"|awk '{print "http://cmip-dn.badc.rl.ac.uk/thredds/esgcet/"<http://cmip-dn.badc.rl.ac.uk/thredds/esgcet/> $1}'| xargs -P 12 -r -n 1 wget -nv -r --level 2 --certificate=/tmp/x509up_u48731 --private-key=/tmp/x509up_u48731 --ca-directory=~/.globus/certificates --save-cookies=cookie --load-cookies=cookie






On 03/03/2011 09:38 AM, Estanislao Gonzalez wrote:
Indeed it will Rachana.

I'm just performing a test (or was, as it's not working anymore after 3TB... because of some technical difficulties :-) to see how good/bad this is. And it was not bad at all (while it worked).

I had quite some problems with guc too, so I'm just trying different possibilities out.

And I see the SSL double redirection now, nice :-)

I think Stephen is already at that anyway, there were some minimal problems with the publication.

And a single wget connection is now running at 12MB/s.

Thanks,
Estani

Am 03.03.2011 18:26, schrieb Rachana Ananthakrishnan:

On Mar 3, 2011, at 11:23 AM, Estanislao Gonzalez wrote:


Hi Rachana,

yes the the only data that can be downloaded at the moment is the one published at BADC and is http only...
well actually that I think it again, if the user sends a certificate it means it got redirected to an SSL end point, right? Is he then redirected back to the original or is the connection being made via SSL? I'm not sure now...

Data is download via HTTP, the redirection to SSL is to verify identity of the user with certificates.

I am reaching out to the various site administrators to understand barriers for providing a GridFTP servers for this. It appears to me that your experiment of replicating TBs of data would benefit from a system that is built with such use cases as focus.

Thanks,
Rachana

Thanks,
Estani

Am 03.03.2011 17:52, schrieb Rachana Ananthakrishnan:
Estani,

Are all these numbers using a HTTP based download?

Rachana

On Mar 3, 2011, at 5:13 AM, Estanislao Gonzalez wrote:


Hi,

by Michael suggestion I started a page in the esgf.org<http://esgf.org/> wiki were we could write our measurement regarding CMIP5 federation.
It is not intended for the technical staff, so don't complain about it being too simplified :-)
On the contrary, it is intended to be a very first and rough approximation to what's happening in the cloud by allowing everyone in 5 sec to write a measurement down.

Everyone has write access to this and if we see the necessity we could evolve it into something more detailed, but which will certainly leave the lay users outside.

And the link:
http://esgf.org/wiki/Cmip5Status/Datarates

Does someone else find this useful? Any comments?

Thanks,
Estani

Am 03.03.2011 09:04, schrieb Estanislao Gonzalez:
Hi Karl,

I'm currently testing the replication via wget and got an average of 350 Mbps (around 40MB/s) with 7 parallel streams in the last 14 hours. I've already downloaded 2TB.

So I guess the problem is across the Atlantic. I'd suggest to start many more parallel streams.

In any case I think this might be caused by some problems/congestion in intermediate networks. The network throughput to ECMWF *might* not be comparable because of the network architecture, e.g. intermediate proxies, firewalls, etc. But a 10x speed up might be of no use anyway as downloading 1TB in 2 Months seems hardly to be an option...

When we all have the data replicated worldwide though, it will be possible to download from all 3 servers at the same time, which will more than double the speed from this side of the ocean, and will be a considerable speed up from yours, although probably not as much (don't nail me on this, I'll deny it vehemently <- this is not my writing :-)

Anyway, this being said, the rest should be answered by BADC directly.

Thanks,
Estani

Am 03.03.2011 02:26, schrieb Karl Taylor:
> this
                            should

                            have



                            been sent to















                            go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>































                            >































                            > sorry about that.































                            >































                            > Karl































                            >































                            > -------- Original Message --------

                            Subject:



                            download rates















                            Date:































                            > Wed, 2 Mar 2011 16:43:42 -0800 From:
                            Karl

                            Taylor































                            > <taylor13 at llnl.gov><mailto:taylor13 at llnl.gov>







                            To: esg-support at earthsystemgrid.org<mailto:esg-support at earthsystemgrid.org>















                            Grid































                            > <esg-support at earthsystemgrid.org><mailto:esg-support at earthsystemgrid.org>































                            >































                            >































                            >































                            > Hi all,































                            >































                            > When I download a 25 MB CMIP5 file from

                            BADC



                            using the wget















                            method: *































                            > it took 35 sec to get started * then it



                            downloaded at an















                            average rate































                            > of 1.1 MB/min































                            >































                            > When I download the same file using the

                            point and



                            click















                            method: * it































                            > took 20 sec to get started * then it

                            downloaded



                            at an







                            average















                            rate of































                            > 1.4 MB/min































                            >































                            > When I download 4 files simultaneously

                            (in



                            parallel) using















                            the click































                            > method, I get somewhat slower download

                            rate per



                            file, so















                            somewhat































                            > less than 4 times the data transfer
                            rate

                            I would



                            have







                            gotten































                            > downloading them in series.































                            >































                            > Note that at 1 MB/min, I could download
                           1

                            GB in



                            about 17















                            hours or 1































                            > TB (i.e., about 1/1000 of the entire

                            archive) in



                            about 2















                            years.































                            >































                            > Any way to speed this up some? (A

                            colleague here



                            downloads















                            files at































                            > about 10 times this rate from ECMWF.)































                            >































                            > Note that I haven't checked how fast

                            download



                            rates are







                            from















                            nodes































                            > published to our (PCMDI's) gateway
                            (since

                            they



                            are not yet















                            publicly































                            > available).































                            >































                            > Best regards, Karl































                            >































                            >































                            >































                            >































                            >

_______________________________________________







                            GO-ESSP-TECH















                            mailing































                            > list GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>







































                            > http://mailman.ucar.edu/mailman/listinfo/go-essp-tech


--
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  estanislao.gonzalez at zmaw.de<mailto:estanislao.gonzalez at zmaw.de>




--

Estanislao Gonzalez



Max-Planck-Institut für Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany



Phone:   +49 (40) 46 00 94-126

E-Mail:  estanislao.gonzalez at zmaw.de<mailto:estanislao.gonzalez at zmaw.de>
_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

Rachana Ananthakrishnan
Argonne National Lab | University of Chicago





--

Estanislao Gonzalez



Max-Planck-Institut für Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany



Phone:   +49 (40) 46 00 94-126

E-Mail:  estanislao.gonzalez at zmaw.de<mailto:estanislao.gonzalez at zmaw.de>

Rachana Ananthakrishnan
Argonne National Lab | University of Chicago





--

Estanislao Gonzalez



Max-Planck-Institut für Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany



Phone:   +49 (40) 46 00 94-126

E-Mail:  estanislao.gonzalez at zmaw.de<mailto:estanislao.gonzalez at zmaw.de>





_______________________________________________

GO-ESSP-TECH mailing list

GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>

http://mailman.ucar.edu/mailman/listinfo/go-essp-tech


-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110304/a927ea5f/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list