[Wrf-users] Runtime problem with OpenMPI + RSL_LITE + Multiple Domains.

Mark Dobossy mdobossy at Princeton.EDU
Tue Oct 2 14:59:27 MDT 2007


I am currently attempting to compile and run WRF on a large (256 node  
x 2 procs per node) linux cluster.  The MPI implementation I am using  
is OpenMPI 1.2.3, and the compilers are the intel 9.1 compilers (icc,  
ifort, etc..).  For the purposes of our modeling, we need to use  
RSL_LITE, nesting and multiple domains.

I have been able to get WRF to compile, and everything works great  
for 1 domain.  However, when I attempt to do a multiple domain run, I  
get a segmentation fault.  An example rsl.error file is below (the  
host name has been x'd out for security purposes).  Has anyone seen  
anything like this before?  As I mentioned, using a single domain  
allows the run to go through without problem.  It is only when I  
increase the domain to 2 or 3, that I run into trouble.  Any tips or  
feedback would be greatly appreciated.

-Mark

askid: 25 hostname: xxxxx-186
Quilting with   1 groups of   0 I/O tasks.
   Ntasks in X            4, ntasks in Y            8
periodic coords           25           6           1
non periodic coords           25           6           1
WRF V2.2 MODEL
   *************************************
   Parent domain
   ids,ide,jds,jde            1         110           1         110
   ims,ime,jms,jme           23          61          77         102
   ips,ipe,jps,jpe           29          55          83          96
   *************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
    med_initialdata_input: calling input_model_input
INITIALIZE THREE Noah LSM RELATED TABLES
   STEPRA,STEPCU,STEPBL          14           1           1
   *************************************
   Nesting domain
   ids,ide,jds,jde            1         148           1         127
   ims,ime,jms,jme           32          81          90         117
   ips,ipe,jps,jpe           38          74          96         111
   INTERMEDIATE domain
   ids,ide,jds,jde           39          93          32          79
   ims,ime,jms,jme           47          71          60          76
   ips,ipe,jps,jpe           53          65          66          70
   *************************************
[xxxxx-186:17766] *** Process received signal ***
[xxxxx-186:17766] Signal: Segmentation fault (11)
[xxxxx-186:17766] Signal code: Address not mapped (1)
[xxxxx-186:17766] Failing at address: 0x9b
[xxxxx-186:17766] [ 0] /lib64/tls/libpthread.so.0 [0x32d2e0c4f0]
[xxxxx-186:17766] [ 1] /usr/local/openmpi/1.2.3/intel-ib/x86_64/lib64/ 
libmpi.so.0(MPI_Allgather+0x165) [0x2a9580e935]
[xxxxx-186:17766] [ 2] ./wrf.exe [0x10b2142]
[xxxxx-186:17766] [ 3] ./wrf.exe [0x10b20c8]
[xxxxx-186:17766] [ 4] ./wrf.exe [0x72936a]
[xxxxx-186:17766] [ 5] ./wrf.exe [0x41b7ac]
[xxxxx-186:17766] [ 6] ./wrf.exe [0x40fe51]
[xxxxx-186:17766] [ 7] ./wrf.exe [0x6523be]
[xxxxx-186:17766] [ 8] ./wrf.exe [0x40aa99]
[xxxxx-186:17766] [ 9] ./wrf.exe [0x40aa64]
[xxxxx-186:17766] [10] ./wrf.exe [0x40aa2a]
[xxxxx-186:17766] [11] /lib64/tls/libc.so.6(__libc_start_main+0xdb)  
[0x32d211c3fb]
[xxxxx-186:17766] [12] ./wrf.exe [0x40a96a]
[xxxxx-186:17766] *** End of error message ***




More information about the Wrf-users mailing list