[Go-essp-tech] download rates
stephen.pascoe at stfc.ac.uk
stephen.pascoe at stfc.ac.uk
Fri Mar 4 02:40:44 MST 2011
Hi Mehmet,
This data is not fully published -- i.e. it isn't published to a gateway. Therefore when an authorisation request is made to download the file the gateway denies access.
Cheers,
Stephen.
---
Stephen Pascoe +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Mehmet Balman
Sent: 03 March 2011 22:25
To: Estanislao Gonzalez
Cc: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] download rates
Hi Estani,
I also see some problems while trying to replicate cmip5 data from BADC. Some files gave authorization problems (403), some are listed in thredds but they do not exist.
For example, can you access this file?
http://cmip-dn.badc.rl.ac.uk/thredds/fileServer/esg_dataroot/cmip5/output1/MOHC/HadGEM2-ES/rcp85/day/atmos/day/r1i1p1/v20110127/prc/prc_day_HadGEM2-ES_rcp85_r1i1p1_20301201-20351130.nc
I just parse everything from the thredds catalog. Here is how I am getting files (see the script below for 12 streams). How do you use wget?
If anybody wants to try gridftp downloads, nersc has cmip3 data published with gridftp and http urls.
wget -nv http://cmip-dn.badc.rl.ac.uk/thredds/esgcet/catalog.html -O- | grep output1 | awk 'BEGIN{FS="="}{print $4}'| sed "s/'/ /g"|awk '{print "http://cmip-dn.badc.rl.ac.uk/thredds/esgcet/"<http://cmip-dn.badc.rl.ac.uk/thredds/esgcet/> $1}'| xargs -P 12 -r -n 1 wget -nv -r --level 2 --certificate=/tmp/x509up_u48731 --private-key=/tmp/x509up_u48731 --ca-directory=~/.globus/certificates --save-cookies=cookie --load-cookies=cookie
On 03/03/2011 09:38 AM, Estanislao Gonzalez wrote:
Indeed it will Rachana.
I'm just performing a test (or was, as it's not working anymore after 3TB... because of some technical difficulties :-) to see how good/bad this is. And it was not bad at all (while it worked).
I had quite some problems with guc too, so I'm just trying different possibilities out.
And I see the SSL double redirection now, nice :-)
I think Stephen is already at that anyway, there were some minimal problems with the publication.
And a single wget connection is now running at 12MB/s.
Thanks,
Estani
Am 03.03.2011 18:26, schrieb Rachana Ananthakrishnan:
On Mar 3, 2011, at 11:23 AM, Estanislao Gonzalez wrote:
Hi Rachana,
yes the the only data that can be downloaded at the moment is the one published at BADC and is http only...
well actually that I think it again, if the user sends a certificate it means it got redirected to an SSL end point, right? Is he then redirected back to the original or is the connection being made via SSL? I'm not sure now...
Data is download via HTTP, the redirection to SSL is to verify identity of the user with certificates.
I am reaching out to the various site administrators to understand barriers for providing a GridFTP servers for this. It appears to me that your experiment of replicating TBs of data would benefit from a system that is built with such use cases as focus.
Thanks,
Rachana
Thanks,
Estani
Am 03.03.2011 17:52, schrieb Rachana Ananthakrishnan:
Estani,
Are all these numbers using a HTTP based download?
Rachana
On Mar 3, 2011, at 5:13 AM, Estanislao Gonzalez wrote:
Hi,
by Michael suggestion I started a page in the esgf.org<http://esgf.org/> wiki were we could write our measurement regarding CMIP5 federation.
It is not intended for the technical staff, so don't complain about it being too simplified :-)
On the contrary, it is intended to be a very first and rough approximation to what's happening in the cloud by allowing everyone in 5 sec to write a measurement down.
Everyone has write access to this and if we see the necessity we could evolve it into something more detailed, but which will certainly leave the lay users outside.
And the link:
http://esgf.org/wiki/Cmip5Status/Datarates
Does someone else find this useful? Any comments?
Thanks,
Estani
Am 03.03.2011 09:04, schrieb Estanislao Gonzalez:
Hi Karl,
I'm currently testing the replication via wget and got an average of 350 Mbps (around 40MB/s) with 7 parallel streams in the last 14 hours. I've already downloaded 2TB.
So I guess the problem is across the Atlantic. I'd suggest to start many more parallel streams.
In any case I think this might be caused by some problems/congestion in intermediate networks. The network throughput to ECMWF *might* not be comparable because of the network architecture, e.g. intermediate proxies, firewalls, etc. But a 10x speed up might be of no use anyway as downloading 1TB in 2 Months seems hardly to be an option...
When we all have the data replicated worldwide though, it will be possible to download from all 3 servers at the same time, which will more than double the speed from this side of the ocean, and will be a considerable speed up from yours, although probably not as much (don't nail me on this, I'll deny it vehemently <- this is not my writing :-)
Anyway, this being said, the rest should be answered by BADC directly.
Thanks,
Estani
Am 03.03.2011 02:26, schrieb Karl Taylor:
> this
should
have
been sent to
go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
>
> sorry about that.
>
> Karl
>
> -------- Original Message --------
Subject:
download rates
Date:
> Wed, 2 Mar 2011 16:43:42 -0800 From:
Karl
Taylor
> <taylor13 at llnl.gov><mailto:taylor13 at llnl.gov>
To: esg-support at earthsystemgrid.org<mailto:esg-support at earthsystemgrid.org>
Grid
> <esg-support at earthsystemgrid.org><mailto:esg-support at earthsystemgrid.org>
>
>
>
> Hi all,
>
> When I download a 25 MB CMIP5 file from
BADC
using the wget
method: *
> it took 35 sec to get started * then it
downloaded at an
average rate
> of 1.1 MB/min
>
> When I download the same file using the
point and
click
method: * it
> took 20 sec to get started * then it
downloaded
at an
average
rate of
> 1.4 MB/min
>
> When I download 4 files simultaneously
(in
parallel) using
the click
> method, I get somewhat slower download
rate per
file, so
somewhat
> less than 4 times the data transfer
rate
I would
have
gotten
> downloading them in series.
>
> Note that at 1 MB/min, I could download
1
GB in
about 17
hours or 1
> TB (i.e., about 1/1000 of the entire
archive) in
about 2
years.
>
> Any way to speed this up some? (A
colleague here
downloads
files at
> about 10 times this rate from ECMWF.)
>
> Note that I haven't checked how fast
download
rates are
from
nodes
> published to our (PCMDI's) gateway
(since
they
are not yet
publicly
> available).
>
> Best regards, Karl
>
>
>
>
>
_______________________________________________
GO-ESSP-TECH
mailing
> list GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: estanislao.gonzalez at zmaw.de<mailto:estanislao.gonzalez at zmaw.de>
--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: estanislao.gonzalez at zmaw.de<mailto:estanislao.gonzalez at zmaw.de>
_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
Rachana Ananthakrishnan
Argonne National Lab | University of Chicago
--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: estanislao.gonzalez at zmaw.de<mailto:estanislao.gonzalez at zmaw.de>
Rachana Ananthakrishnan
Argonne National Lab | University of Chicago
--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: estanislao.gonzalez at zmaw.de<mailto:estanislao.gonzalez at zmaw.de>
_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
--
Scanned by iCritical.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110304/a927ea5f/attachment-0001.html
More information about the GO-ESSP-TECH
mailing list