[Wrf-users] Runtime problem with OpenMPI + RSL_LITE + Multiple
Domains.
Mark Dobossy
mdobossy at Princeton.EDU
Tue Oct 2 14:59:27 MDT 2007
I am currently attempting to compile and run WRF on a large (256 node
x 2 procs per node) linux cluster. The MPI implementation I am using
is OpenMPI 1.2.3, and the compilers are the intel 9.1 compilers (icc,
ifort, etc..). For the purposes of our modeling, we need to use
RSL_LITE, nesting and multiple domains.
I have been able to get WRF to compile, and everything works great
for 1 domain. However, when I attempt to do a multiple domain run, I
get a segmentation fault. An example rsl.error file is below (the
host name has been x'd out for security purposes). Has anyone seen
anything like this before? As I mentioned, using a single domain
allows the run to go through without problem. It is only when I
increase the domain to 2 or 3, that I run into trouble. Any tips or
feedback would be greatly appreciated.
-Mark
askid: 25 hostname: xxxxx-186
Quilting with 1 groups of 0 I/O tasks.
Ntasks in X 4, ntasks in Y 8
periodic coords 25 6 1
non periodic coords 25 6 1
WRF V2.2 MODEL
*************************************
Parent domain
ids,ide,jds,jde 1 110 1 110
ims,ime,jms,jme 23 61 77 102
ips,ipe,jps,jpe 29 55 83 96
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
med_initialdata_input: calling input_model_input
INITIALIZE THREE Noah LSM RELATED TABLES
STEPRA,STEPCU,STEPBL 14 1 1
*************************************
Nesting domain
ids,ide,jds,jde 1 148 1 127
ims,ime,jms,jme 32 81 90 117
ips,ipe,jps,jpe 38 74 96 111
INTERMEDIATE domain
ids,ide,jds,jde 39 93 32 79
ims,ime,jms,jme 47 71 60 76
ips,ipe,jps,jpe 53 65 66 70
*************************************
[xxxxx-186:17766] *** Process received signal ***
[xxxxx-186:17766] Signal: Segmentation fault (11)
[xxxxx-186:17766] Signal code: Address not mapped (1)
[xxxxx-186:17766] Failing at address: 0x9b
[xxxxx-186:17766] [ 0] /lib64/tls/libpthread.so.0 [0x32d2e0c4f0]
[xxxxx-186:17766] [ 1] /usr/local/openmpi/1.2.3/intel-ib/x86_64/lib64/
libmpi.so.0(MPI_Allgather+0x165) [0x2a9580e935]
[xxxxx-186:17766] [ 2] ./wrf.exe [0x10b2142]
[xxxxx-186:17766] [ 3] ./wrf.exe [0x10b20c8]
[xxxxx-186:17766] [ 4] ./wrf.exe [0x72936a]
[xxxxx-186:17766] [ 5] ./wrf.exe [0x41b7ac]
[xxxxx-186:17766] [ 6] ./wrf.exe [0x40fe51]
[xxxxx-186:17766] [ 7] ./wrf.exe [0x6523be]
[xxxxx-186:17766] [ 8] ./wrf.exe [0x40aa99]
[xxxxx-186:17766] [ 9] ./wrf.exe [0x40aa64]
[xxxxx-186:17766] [10] ./wrf.exe [0x40aa2a]
[xxxxx-186:17766] [11] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x32d211c3fb]
[xxxxx-186:17766] [12] ./wrf.exe [0x40a96a]
[xxxxx-186:17766] *** End of error message ***
More information about the Wrf-users
mailing list