[Wrf-users] run time error - collective abort of all ranks
Gerardo De Jesus Montoya Gaviria
gdmontoyag at unal.edu.co
Thu Mar 12 12:18:38 MDT 2009
I'm running WRF 3.0.1.1 in a Dell precision 690 machine with two Xeon 64bits dual-core processors, with ifort 10.1 and mpich2 for a domain 118x118 points with dx=dy= 18.9km. The compilation after a lot of time (1 hour aprox.) finishes succesfully. the run finish with error message almost immediately after the comand: mpiexec -n 4 ./wrf.exe.
We also used the comand: ulimit -s unlimited. The erro message is:
************************************************************
mpiexec -n 4 ./wrf.exe
starting wrf task 0 of 4
starting wrf task 1 of 4
starting wrf task 3 of 4
starting wrf task 2 of 4
rank 2 in job 2 localhost.localdomain_53980 caused collective abort of all ranks
exit status of rank 2: killed by signal 11
rank 1 in job 2 localhost.localdomain_53980 caused collective abort of all ranks
exit status of rank 1: killed by signal 11
*************************************************************
The configuration we are using is:
7. Linux x86_64 i486 i586 i686, ifort compiler with icc (dmpar)
Compile for nesting: 1=basic
OS: Scientific Linux 5.1 64Bits
We found similar problems posted in the User forum, but without a clear response. I'll greatly appreciate any help.
Gerardo Montoya. Profesor titular, Universidad nacional de Colombia.
More information about the Wrf-users
mailing list