[Go-essp-tech] zombie processes

Jennifer Adams jma at cola.iges.org
Mon Dec 19 18:07:41 MST 2011


Dear Colleagues, 
I am doing all my wgetting on a suite of 64-bit CentOS boxes with a common gluster filesystem. My downloads are leaving a trail of hung processes on the servers that can't be killed (we refer to them as zombies). Too many zombies and the only remedy is a reboot, which is inconvenient for all the other users doing stuff on the servers. Some file-locking errors have shown up in the logs, which makes sense because the zombies are usually 'mv' or 'ls' commands, or the wget command itself, or similar. One pattern I've seen is a file gets stuck as a partial download, the checksums don't match, so it starts the download again, but the file can't be overwritten, and the wget gets caught in an infinite loop. And you can't kill the process or delete the file in question without spawning more zombies. It also seems to happen when I poke at the 0-byte files that occur when a download fails. 

My sysadm is considering a gluster upgrade, but that would be big job and highly disruptive and before I send him down that path I'd like to be sure that we've ruled out other possible causes. Nobody at COLA but us CMIP5 downloaders are reporting any problems. So, has anyone else experienced these kinds of issues? Is it possible that the interaction of wget with a busy data node is the cause?

> wget --version
GNU Wget 1.11.4 Red Hat modified
> uname -a
Linux cola2.gmu.edu 2.6.18-238.12.1.el5 #1 SMP Tue May 31 13:22:04 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux


--Jennifer


--
Jennifer M. Adams
IGES/COLA
4041 Powder Mill Road, Suite 302
Calverton, MD 20705
jma at cola.iges.org



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20111219/b5a797c0/attachment.html 


More information about the GO-ESSP-TECH mailing list