[Go-essp-tech] repeated md5 chksum failures

Estanislao Gonzalez gonzalez at dkrz.de
Thu Apr 12 02:17:14 MDT 2012


Hi Jennifer,

please be aware that this is not only taking up your bandwidth but also 
that from everyone. That's why I'm not entirely comfortable with 
automatic systems that can't detect a problem (in this case downloading 
a 1G file 21 times which might have run over the weekend or even 
longer). But that's just to give some information on the implications of 
such procedures.

Now regarding the file corruption... I can't tell you for sure what's 
happening at your end. The wget script is recognizing the file is 
corrupt and downloading it again, which is the expected (and desired) 
behavior.
Now why you are getting a corrupt file in such extremely high 
proportions is beyond my knowledge, I've never seen a transport failing 
in such a high rate.

The md5 hash you've shown is the correct one:
$ md5sum 
/gpfs_750/projects/CMIP5/data/cmip5/output2/MPI-M/MPI-ESM-P/piControl/mon/ocean/Omon/r1i1p1/v20111028/rhopoto/rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc  
036aabfc10caa76a8943f967bc10ad4d  
/gpfs_750/projects/CMIP5/data/cmip5/output2/MPI-M/MPI-ESM-P/piControl/mon/ocean/Omon/r1i1p1/v20111028/rhopoto/rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc

So this has at least a couple of possible causes:
1) There's a network issue (might be since you are seeing corruption 
from multiple files and datasets and the behavior is erratically)
2) wget is not working as expected (I think you are on MacOS and that 
system has not been properly tested, at least not by me). Perhaps it's 
behaving like curl and writing to the file all errors found, this is not 
the case with the wget version I've been testing it width (and makes no 
sense in such a non-interrupted download)
3) You have a disk failure (data gets corrupted when written to disk)
4) you have other memory/network buffer errors (unlikely as you should 
have seen this happening with other connections in Internet)

My advice:
- alter the wget script and remove all file but one of the problematic 
files.
- Use "-f" to leave the file even if its md5 doesn't match, and be sure 
this is the case (there will be an output to the console)
- rename the file and get it again without the "-f" until is finally there.
- compare the files to see the difference (you'll see if there's text in 
there, like an html page, or just bytes or a block of bytes). In bash:
     diff <(hexdump -C file1) <(hexdump -C file2) | less
- try getting the same file (3 times perhaps?) from a different machine 
(even better if the OS is different). If it always succeeds the problem 
is definitely related to your machine. If the behavior is the same, then 
the issue is definitely in the network.

Hope this helps,
Estani

Am 11.04.2012 20:51, schrieb Jennifer Adams:
> Hi, Everyone --
> I'm trying to download some fairly large files (~1Gb) from the 
> piControl run (monthly ocean variables) and find that the checksum 
> fails to match several times and then will be ok. In some cases, it 
> can take 10 or more re-tries before the checksum succeeds.
>
> The problem is not with a specific data node. Here are some of the 
> dataset IDs for the troublesome downloads:
> cmip5.output1.CCCma.CanESM2.piControl.mon.ocean.Omon.r1i1p1.v20111028
> cmip5.output1.INM.inmcm4.piControl.mon.ocean.Omon.r1i1p1.v20110323
> cmip5.output1.MIROC.MIROC-ESM.piControl.mon.ocean.Omon.r1i1p1.v20110929
> cmip5.output1.MRI.MRI-CGCM3.piControl.mon.ocean.Omon.r1i1p1.v20110831
> cmip5.output1.NCAR.CCSM4.piControl.mon.ocean.Omon.r1i1p1.v20120220
> cmip5.output1.NCC.NorESM1-M.piControl.mon.ocean.Omon.r1i1p1.v20110901
> cmip5.output2.MRI.MRI-CGCM3.piControl.mon.ocean.Omon.r1i1p1.v20110831
> cmip5.output2.NCC.NorESM1-M.piControl.mon.ocean.Omon.r1i1p1.v20110901
> cmip5.output1.MPI-M.MPI-ESM-LR.piControl.mon.ocean.Omon.r1i1p1.v20120315
> cmip5.output1.MPI-M.MPI-ESM-P.piControl.mon.ocean.Omon.r1i1p1.v20120315
> cmip5.output2.MPI-M.MPI-ESM-P.piControl.mon.ocean.Omon.r1i1p1.v20111028
>
> For example, from the final two datasets in the list, here is an entry 
> from the wget script:
> 'rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' 
> 'http://bmbf-ipcc-ar5.dkrz.de/thredds/fileServer/cmip5/output2/MPI-M/MPI-ESM-P/piControl/mon/ocean/Omon/r1i1p1/v20111028/rhopoto/rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' 
> <http://bmbf-ipcc-ar5.dkrz.de/thredds/fileServer/cmip5/output2/MPI-M/MPI-ESM-P/piControl/mon/ocean/Omon/r1i1p1/v20111028/rhopoto/rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc%27> 
> 'MD5' '036aabfc10caa76a8943f967bc10ad4d'
>
> Here are the 21 download tries so far today, taking 5 hours, the "md5 
> failed!" message appears in the log file after each one:
> 2012-04-11 09:19:18 (2.19 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 09:35:05 (1.13 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 09:53:26 (1009 KB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 10:05:52 (1.49 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 10:17:03 (1.61 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 10:31:14 (1.30 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 10:48:50 (1.04 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 11:01:09 (1.46 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 11:14:01 (1.40 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 11:29:46 (1.15 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 11:42:39 (1.40 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 12:01:05 (1011 KB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 12:18:25 (1.03 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 12:35:30 (1.04 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 12:49:44 (1.35 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 13:08:38 ( 960 KB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 13:26:11 (1.01 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 13:36:21 (1.78 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 13:50:53 (1.25 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 14:06:26 (1.15 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
> 2012-04-11 14:19:43 (1.39 MB/s) - 
> `rhopoto_Omon_MPI-ESM-P_piControl_r1i1p1_185001-185912.nc' saved 
> [1083611268/1083611268]
>
> This one failed 14 times before finally getting the "md5 ok" message 
> -- it took 3 hrs 45 minutes to get this file:
> 'so_Omon_MPI-ESM-P_piControl_r1i1p1_189001-189912.nc' 
> 'http://bmbf-ipcc-ar5.dkrz.de/thredds/fileServer/cmip5/output1/MPI-M/MPI-ESM-P/piControl/mon/ocean/Omon/r1i1p1/v20120315/so/so_Omon_MPI-ESM-P_piControl_r1i1p1_189001-189912.nc' 
> <http://bmbf-ipcc-ar5.dkrz.de/thredds/fileServer/cmip5/output1/MPI-M/MPI-ESM-P/piControl/mon/ocean/Omon/r1i1p1/v20120315/so/so_Omon_MPI-ESM-P_piControl_r1i1p1_189001-189912.nc%27> 
> 'MD5' '175d6c9dd3ffea30186e6bc9c7e3dee1'
>
> This problem is sucking up my bandwidth and my time, which are not 
> unlimited. Is there any remedy?
> --Jennifer
>
>
>
>
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20120412/a8755333/attachment.html 


More information about the GO-ESSP-TECH mailing list