[ASP-GAU-Users] IBM LoadLeveler commands
Christiane Jablonowski
cjablono at ucar.edu
Wed Mar 16 17:20:01 MST 2005
Hi everybody,
as a follow-up to today's meeting I would like to send you the some
LoadLeveler options (for IBM machines only). Here is a copy of a job
script we looked at:
#@ class = com_rg32
#@ node = 1
#@ tasks_per_node = 8
#@ wall_clock_limit = 00:59:00
#@ output = out.$(jobid)
#@ error = out.$(jobid)
#@ job_type = parallel
#@ network.MPI = csss,not_shared,us
#@ node_usage = not_shared
#@ account_no = 54042108
#@ ja_report = yes
#@ environment =
AIXTHREAD_SCOPE=S;MALLOCMULTIHEAP=TRUE;MP_SHARED_MEMORY=yes;
MEMORY_AFFINITY=MCM
#@ queue
...
# must be set equal to (CPUs-per-node / tasks_per_node)
setenv OMP_NUM_THREADS 4
In this particular example, we request 1 node with 32 processors (class
com_rg32). 8 of the 32 processors get assigned to 8 (tasks_per_node)
MPI processes. In the models this is most commonly done in a so-called
domain decomposition approach.
The OpenMP parallelization sits on top of the MPI parallelization and
works within the (here) 8 domains. In the example above, each MPI
process spawns 4 OpenMP parallel threads that e.g. parallelize a loop
in the vertical direction. Please note, that an MPI or OpenMP
parallelization is not automatic and needs to be explicitly specified
in the code. This is already done in NCAR's standard codes. In general,
it's best if
OMP_NUM_THREADS * tasks_per_node = number of processors in a node
(here 32)
The corresponding configuration for the com_rg8 nodes is
#@ class = com_rg8
#@ node = 4
#@ tasks_per_node = 2
...
setenv OMP_NUM_THREADS 4
The
#@ environment ...
option (one long string) might speed up the calculation.
MP_SHARED_MEMORY=yes avoids unnecessary MPI messages across the
communication network if the communication takes place within 1 node
(which corresponds to memory copies).
If you know the approximate run time of your job the
#@ wall_clock_limit
setting could reduce the waiting time in the queue. Then the jobs
(especially short ones) can probably be squeezed in if a group of nodes
gets reserved (and run idle) for bigger jobs. If you don't specify the
wallclock time, the maximum time for the queue is assumed (e.g. 6h for
the regular queues).
Best,
Christiane
More information about the ASP-GAU-Users
mailing list