[Wrf-users] run time error - collective abort of all ranks

Gerardo De Jesus Montoya Gaviria gdmontoyag at unal.edu.co
Thu Mar 12 12:18:38 MDT 2009


I'm running WRF 3.0.1.1 in a Dell precision 690 machine with two Xeon 64bits dual-core processors, with ifort 10.1 and mpich2 for a domain 118x118 points with dx=dy= 18.9km. The compilation after a lot of time (1 hour aprox.) finishes succesfully. the run finish with error message almost immediately after the comand: mpiexec -n 4 ./wrf.exe.
We also used the comand: ulimit -s unlimited.  The erro message is: 

************************************************************
mpiexec -n 4 ./wrf.exe 
 starting wrf task            0  of            4
 starting wrf task            1  of            4
 starting wrf task            3  of            4
 starting wrf task            2  of            4
rank 2 in job 2  localhost.localdomain_53980   caused collective abort of all ranks
  exit status of rank 2: killed by signal 11 
rank 1 in job 2  localhost.localdomain_53980   caused collective abort of all ranks
  exit status of rank 1: killed by signal 11
*************************************************************
The configuration we are using is:

 7.  Linux x86_64 i486 i586 i686, ifort compiler with icc  (dmpar)
 Compile for nesting: 1=basic

OS: Scientific Linux 5.1 64Bits

We found similar problems posted in the User forum, but without a clear response. I'll greatly appreciate any help.
Gerardo Montoya. Profesor titular, Universidad nacional de Colombia.



More information about the Wrf-users mailing list