[Wrf-users] wrf.exe on a RedHat 5 compatible cluster
Lampros Mountrakis
lmount at grid.auth.gr
Fri Apr 23 00:51:51 MDT 2010
I am trying to run wrf.exe 3.1.1 to a RedHat 5 compatible cluster and
all I get is errors. I tried several compilation options, such as
dmpar/dmsm and static/dynamic and all of them fail. The common options
consist of the em_real case, MPICH1 and the Intel compiler.
The very same case provides reasonable output in a RedHat 4 based cluster.
" ulimit -s unlimited " is present at the running script, before the
mpirun, as well as the assignment of the parameters, which I found on
several topics, having similar problems:
export MPICH_UNEX_BUFFER_SIZE=1024M
export P4_GLOBMEMSIZE=536870912
export MP_STACK_SIZE=64000000
export KMP_STACKSIZE=2048M
The most common errors are the following:
std error
rm_l_4_18119: (1065.371648) net_send: could not write to fd=5, errno = 32
rm_l_15_12081: (1052.342272) net_send: could not write to fd=5, errno = 32
rm_l_6_27731: (1064.724480) net_send: could not write to fd=5, errno = 32
rm_l_10_12047: (1063.745536) net_send: could not write to fd=5, errno = 32
rm_l_14_12071: (1052.841984) net_send: could not write to fd=5, errno = 32
rm_l_13_12065: (1053.071360) net_send: could not write to fd=5, errno = 32
rm_l_7_12029: (1064.433664) net_send: could not write to fd=5, errno = 32
rm_l_9_12041: (1063.974912) net_send: could not write to fd=5, errno = 32
rm_l_11_12053: (1058.514944) net_send: could not write to fd=5, errno = 32
rm_l_12_12059: (1053.300736) net_send: could not write to fd=5, errno = 32
rm_l_2_21220: (1071.020032) net_send: could not write to fd=5, errno = 32
==> rsl.error.0000 <==
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 71
program wrf: error opening wrfinput_d01 for reading ierr= -1021
-------------------------------------------
[0] MPI Abort by user Aborting program !
[0] Aborting program!
==> rsl.out.0000 <==
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 71
program wrf: error opening wrfinput_d01 for reading ierr= -1021
-------------------------------------------
taskid: 0 hostname: wn024.grid.auth.gr
p0_30948: p4_error: : 1
p0_30948: (33.830912) net_send: could not write to fd=5, errno = 32
==> rsl.out.0001 <==
alloc_space_field: domain 1, 58257184 bytes allocated
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 71
program wrf: error opening wrfinput_d01 for reading ierr= -1021
-------------------------------------------
taskid: 1 hostname: wn024.grid.auth.gr
>From time to time I get
starting wrf task 7 of 16
starting wrf task 9 of 16
starting wrf task 13 of 16
starting wrf task 0 of 16
starting wrf task 1 of 16
starting wrf task 2 of 16
starting wrf task 3 of 16
starting wrf task 4 of 16
starting wrf task 5 of 16
starting wrf task 6 of 16
starting wrf task 8 of 16
starting wrf task 15 of 16
starting wrf task 10 of 16
starting wrf task 11 of 16
starting wrf task 12 of 16
starting wrf task 14 of 16
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
If you have something to suggest, or some kind of solution, I would be grateful.
Thank you for your time.
__
Lampros
More information about the Wrf-users
mailing list