[Wrf-users] error in configuring cluster; mpiexec: failed to obtain sock from manager

Tue Sep 25 10:12:44 MDT 2007

We are trying to run the WRF model in a cluster of two intel machines (each one with two dual core procesors x86_64) under linux fedora 6. 
We can run succesfully the model in one machine with 4 nodes (two dual core 2x processor), Using MPICH2 2-1.0.5p4 with PGI version 7.0.7.

However, when we run the same model in the two machine cluster, Mpich 2 only recognizes two processors (one per machine) as it is shown by the comands: mpdtrace, mpdringtest and mpiexec.

[administrador at cs-room443-d01 Paquetes]$ mpdtrace
cs-room443-d01
cs-room443-d02

[administrador at cs-room443-d01 Paquetes]$ mpdringtest
time for 1 loops = 0.000850915908813 seconds

[administrador at cs-room443-d01 Paquetes]$ mpiexec -l -n 1 /bin/hostname
0: cs-room443-d01

[administrador at cs-room443-d01 Paquetes]$ mpiexec -l -n 2 /bin/hostname
0: cs-room443-d01
1: cs-room443-d02

[administrador at cs-room443-d01 Paquetes]$ mpiexec -l -n 3 /bin/hostname
mpiexec_cs-room443-d01 (mpiexec 425): mpiexec: failed to obtain sock from manager

Is there any way to configure mpich for our cluster with 8 nodes?. Any suggestion will be helpful. 

Thanks.

Gerardo Montoya
Full professor 
National Unversity of Colombia.