[Wrf-users] WRV3 crash....MPI problem?

Pirovano Guido (CESIRICERCA) Guido.Pirovano at cesiricerca.it
Fri Jul 25 06:32:57 MDT 2008


Dear all,
we are trying to run WRFV3 on a set of 3 nested domains
using MPI, but WRF-ARW crashes...apparently in rather random way...


we run wrf on linux cluster (x86_64 - Intel xeon) with suse 9.1.

we compiled wrf with PGI as follows:

    # Settings for Linux x86_64, PGI compiler with gcc  (dm+sm)
    #
    DMPARALLEL      =        1
    OMPCPP          =        -D_OPENMP
    OMP             =        -mp -Minfo=mp
    SFC             =       /opt/cluster/pgi/linux86-64/7.1-5/bin/pgf90
    SCC             =       gcc
    DM_FC           =       /opt/cluster/pgi/linux86-64/7.1/mpi/mpich/bin/mpif90 -f90=        $(SFC)
    DM_CC           =       /opt/cluster/pgi/linux86-64/7.1/mpi/mpich/bin/mpicc -cc=$(SCC) -DMPI2_SUPPORT -DMPI2_THREAD_SUPPORT
    FC              =        $(DM_FC)
    CC              =       $(DM_CC) -DFSEEKO64_OK
    LD              =       $(FC)
    RWORDSIZE       =       $(NATIVE_RWORDSIZE)

we run the model with the following command:

    /opt/cluster/pgi/linux86-64/7.1/mpi/mpich/bin/mpirun -p4pg machs_mpirun_wrf ./wrf.exe


with the following machs file:

   regulus 0 /data/2/STUDIES/GOV_SIS/WRF/WRFV3/wrf.exe
   n7 1 /data/2/STUDIES/GOV_SIS/WRF/WRFV3/wrf.exe
   n3 1 /data/2/STUDIES/GOV_SIS/WRF/WRFV3/wrf.exe
   n1 1 /data/2/STUDIES/GOV_SIS/WRF/WRFV3/wrf.exe


when the model crashes we get the following message in the standard log:

 starting wrf task             0  of             4
 starting wrf task  starting wrf task             2  of             4
 starting wrf task             1  of             4
            3  of             4
  rm_l_3_29214: (18665.636719) net_send: could not write to fd=5, errno = 32
  P4 procgroup file is machs_mpirun_wrf.


and this one in rsl.out.0003

.....
......
 WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS =   1
 WRF NUMBER OF TILES =   1
 WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS =   1
 WRF NUMBER OF TILES =   1
 d01 2004-07-15_06:00:00 Input data processed for wrflowinp_d<domain> for domain   1
 d01 2004-07-15_12:00:00 Input data processed for wrflowinp_d<domain> for domain   1
  p3_29206:  p4_error: interrupt SIGSEGV: 11


we have put in attach our namelist.input too

do you think the problem might be somehow related to MPI? how?



Any help will be greatly appreciated!

kind regards

guido


____________________________
Guido Pirovano
CESI RICERCA SpA
Dept - Environment & Sustainable Development

Via Rubattino, 54
20134 Milano - Italy
tel. +39 02 3992 4625
fax +39 02 3992 4608
guido.pirovano at cesiricerca.it
www.cesiricerca.it




CESI RICERCA SpA ha adottato il Modello Organizzativo ai sensi del D.Lgs.231/2001, in forza del quale l'assunzione di obbligazioni da parte della Società avviene con firma di un procuratore, munito di idonei poteri.
CESI RICERCA adopts a Compliance Programme under the Italian Law (D.Lgs.231/2001). According to this CESI RICERCA Compliance Programme, any commitment of CESI RICERCA is taken by the signature of one Representative granted by a proper Power of Attorney. Le informazioni contenute in questo messaggio di posta elettronica sono riservate e confidenziali e ne e' vietata la diffusione in qualsiasi modo o forma. Qualora Lei non fosse la persona destinataria del presente messaggio, La invitiamo a non diffonderlo e ad eliminarlo, dandone gentilmente comunicazione al mittente. The information included in this e-mail and any attachments are confidential and may also be privileged. If you are not the correct recipient, you are kindly requested to notify the sender immediately, to cancel it and not to disclose the contents to any other person.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: namelist.input
Type: application/octet-stream
Size: 9777 bytes
Desc: namelist.input
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20080725/6428a3b3/namelist.obj


More information about the Wrf-users mailing list