[Wrf-users] WRF ... mpi Buffering message error

Alan Gadian alan at env.leeds.ac.uk
Fri Nov 9 09:16:08 MST 2007


Hi,

We are running WRF on 1024 dual processor cores  (i.e. np=2048)
on an XT4.

We had the following error message:-

>  internal ABORT - process 0: Other MPI error, error stack:
>  MPIDI_PortalsU_Request_PUPE(317): exhausted unexpected receive queue
>  buffering increase via env. var. MPICH_UNEX_BUFFER_SIZE

which, we are told means

"The application is sending too many short, unexpected messages to
a particular receiver."

We have been advised that  to work around the problem we should

"Increase the amount of memory for MPI buffering using the
MPICH_UNEX_BUFFER_SIZE variable(default is 60 MB) and/or decrease
the short message threshold using the MPICH_MAX_SHORT_MSG_SIZE
(default is 128000 bytes) variable. May want to set MPICH_DBMASK
to 0x200 to get a traceback/coredump to learn where in
application this problem is occurring."

The question, is has anyone else had this problem.  The code
worked without any problem with 500 cores, and given the size
of the problem, we think we can get good scalability up to
3000 cores.  However, has anyone any advice on what is
happenning or what the numbers we should be doing, and how
dependent is it on the number of processors.

Cheers
Alan

-----------------------------------
Address: Alan Gadian, Environment, SEE, 
Leeds University, Leeds LS2 9JT.  U.K.
Email: alan at env.leeds.ac.uk. 
http://www.env.leeds.ac.uk/~alan

Atmospheric Science Letters; the New Journal of R. Met. Soc. 
Free Sample:-  http://www.interscience.wiley.com/asl-sample2007
-----------------------------------




More information about the Wrf-users mailing list