[cam-users] Re: mpirun on boewulf cluster
Jim Rosinski
rosinski@cgd.ucar.edu
Tue, 3 Sep 2002 18:10:53 -0600 (MDT)
On Fri, 30 Aug 2002, Ghan, Steven J wrote:
> I've compiled cam2 to run spmd on a beowulf cluster (redhat 7.1, portland
> group compiler). But when I try to run
> mpirun -np 2 -machinefile machines cam < namelist > camout
> I get the dreaded broken pipe message to the terminal and the following
> message in camout:
>
> t_setoption: option disabled: Usr Sys
> t_setoption: option disabled: Usr Sys
>
> which is coming from cam2/models/utils/timing/t_setoption.c. It seems that I
> can turn this problem off by defining DISABLE_TIMERS, but why should I have
> to? The code runs fine without spmd. Any ideas on other solutions?
Broken pipes often happen when one process dies unexpectedly and another is
still trying to send data to or receive data from it. If you're using mpich,
this can happen when mpi tries to route stdout to the master process, but one
of the slaves has died. Though stranger things have happened, I doubt that
the problem you are encountering is actually occurring in any of the
utils/timing code. To check for sure, I'd suggest running mpirun with -p4pg
hostfile -p4norem, then firing up master and slaves by hand. That should at
least eliminate the "broken pipe" nonsense.
Jim Rosinski