[Wrf-users] RE: Memory & MPI Usage

Brent L Shaw bshaw at wdtinc.com
Tue Jan 29 07:43:54 MST 2008


Dan,

Regarding the use of an IO node, I have used that capability in the
past.  If you are using one or more IO nodes, be sure when you launch
the mpirun command that you request n+i slots on the -np argument, where
n is the number of processes over which you wish to have your domain
decomposed, an i is the number of IO processes to use.  Then, in your
machine file, the first n slots will be used for decomposition, and
after those, the next i slots will be used as the IO node(s).  So, for
example, if you want to decompose the WRF domain over 16 processors and
use 1 IO node, you need to have 17 processors available.  In this case,
your mpirun command would look something like "mpirun -np 17
-machinefile my.machines wrf.exe".  In the "my.machines" file, the first
16 slots would only be used for computations, and the 17th slot would be
the IO slot. If you want this to be the headnode, then make sure the
headnode is in the 17th slot in this case.  

You won't always get better performance with and IO node.  It depends a
lot on whether or not the node is writing to a local disk or to a
network shared disk, and in the latter case, whether or not the network
interface used is the same network interface being used for the MPI
traffic.  If you don't have a separate network between your compute
nodes dedicated only to MPI traffic, you might end up with conflicts
that negate the benefit of using an IO node.  Also, using more than one
IO node will not necessarily improve things over 2 IO nodes either.  I
would only recommend using multiple IO nodes if there is a memory issue
trying to use just one IO node (which you should be able to overcome if
you ensure your "big memory IO node" is the n+i slot in your machines
file.

As far as nesting, the only advantage to using the nested solution is
that the simulation will run faster in the nested case, because your
12-km domain (and its shorter timestep) is smaller.  If you can run the
larger 12-km domain within your required timeline, then that's what I
would do.  Anytime you can afford to run what you need without nesting,
I think it is better, because you avoid introducing additional boundary
errors, etc.  The only other reason you might want to run an outer nest
would be if the external model that is providing lateral boundary
conditions is more than 5-6 times coarser (in terms of grid spacing)
than your desired grid.  If this is the case, though, I would use and
outer nest of 36 or 48 km if my inner nest is 12km.  Some may disagree,
but I think running a 12-km grid even within 1x1 degree GFS for boundary
conditions is an acceptable solution, even though the ratio between the
12-km grid and the GFS is quite large.

Hope this helps.

Brent

-----Original Message-----
From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On
Behalf Of wrf-users-request at ucar.edu
Sent: Monday, January 28, 2008 1:00 PM
To: wrf-users at ucar.edu
Subject: Wrf-users Digest, Vol 41, Issue 10

----------------------------------------------------------------------

Message: 1
Date: Sun, 27 Jan 2008 15:53:42 -0800
From: "Dan Dansereau" <ddansereau at hydropoint.com>
Subject: [Wrf-users] Memory & MPI usage
To: <wrf-users at ucar.edu>
Message-ID:
	
<9E8869B62C56004781914D8107A36DEF01B5409B at email.corp.hydropoint.com>
Content-Type: text/plain;	charset="us-ascii"

To All
Two questions - if I may
Question 1
Would any one out there have any experience in the following:
1) forcing the wrf IO on a particular processing node?
2) Or forcing the IO on the head node while using the mpi ?

I am using the PGI/mpi version and changing the nio_tasks_per_group
parameter to 2 for a very large domain run.

The problem that I am having - that if the IO is done on a processing
node, the total run will take longer - or will run out of memory on the
node that is doing the IO.
By changing the machines file - I can sometimes get the right sequence
to latch it to a particular node, and even sometimes to the head node -
but most of the time - it seems that the MPI grabs the node in a random
fashion.

Question 2
Are there any advantages or disadvantages to running a 2 level nested
domain run - verses 1 larger domain at a finer resolution on a 24 hour
long simulation? - In this case a 24Km with a large 12Km nest, verses a
12Km over the same geographic area as the 24Km nested run.

Thanks in advance
Dan A. Dansereau





More information about the Wrf-users mailing list