[Go-essp-tech] On use of wget over http

Estanislao Gonzalez gonzalez at dkrz.de
Thu Oct 27 05:20:38 MDT 2011


Hello Martin,

I totally agree as you may already know :-)

I'd like to add another good point for this: bandwidth.
The archive is too large to just expect people to download everything 
all over again just to find out it hasn't change.
Tools are built around this too. By providing a checksum, it's no need 
to download something you "know" it hasn't change.

For us, as archive sites, I don't think we will really archive anything 
without a checksum, I know I won't.

And furthermore, checksums allows us to perform a version diff even 
without having any other information,e.g. check the  history tab  at WDCC:
Version with added and renamed files: 
http://ipcc-ar5.dkrz.de/dataset/cmip5.output1.NCC.NorESM1-M.sstClim.mon.land.Lmon.r1i1p1.html
and some deleted files: 
http://ipcc-ar5.dkrz.de/dataset/cmip5.output2.MPI-M.MPI-ESM-LR.rcp26.mon.ocean.Omon.r1i1p1.html

This is only possible with checksums, and I think this info would be 
quite helpful for users to know.

Thanks,
Estani
Am 27.10.2011 12:13, schrieb martin.juckes at stfc.ac.uk:
>
> Hello,
>
> Yesterday I ran a few tests transferring a 2Gb file from CSIRO to a 
> server at Reading in the UK using wget over http. I ran the wget 
> command 4 times, and each time got a file of the correct size and 
> incorrect checksum. Wget was using multiple automatic retries. I then 
> throttled back the transfer rate to 400Kbytes/s and got the file 
> transferred in one go, and with the correct checksum. It just took a 
> little longer.
>
> My tentative  conclusions are that users cannot access the data 
> reliably if we do not provide checksums, and that download scripts 
> which do not verify checksums are not good enough for an archive of 
> this size,
>
> Cheers,
>
> Martin
>
>
> -- 
> Scanned by iCritical.
>
>
>
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20111027/912a53d3/attachment.html 


More information about the GO-ESSP-TECH mailing list