[Wrf-users] Optimizing OMP_NUM_THREADS

Surya Ramaswamy Surya.Ramaswamy at erm.com
Fri Jul 24 14:02:29 MDT 2015


Hi Brian - Depending upon the processor, application and task, the optimum number of processors varies while performing the parallelization.  In your case, the optimum number of processors require to perform your task is 4, going anything over 4 processors takes too much of overhead and lose performance during parallelization.

Regards,

Surya Ramaswamy
ERM
75 Valley Stream Parkway Suite 200
Malvern, PA 19355
484-913-0300 (main)
484-913-0301 (fax)

From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On Behalf Of Andrus, Brian Contractor
Sent: Friday, July 24, 2015 2:52 PM
To: wrf-users at ucar.edu
Subject: [Wrf-users] Optimizing OMP_NUM_THREADS

Hello,

I am a little confused about running wrf. I have built it and am following the testing for the Jan 2000 data set per: http://www2.mmm.ucar.edu/wrf/OnLineTutorial/CASES/JAN00/wrf.htm

Everything does work and I can get proper output, however I have noticed unusual timing that I cannot figure why.
I am running on a system with 64 cores and 256GB RAM.
I compiled with
  1) compile/pgi/15.7     2) mpi/openmpi/1.8.5    3) app/netcdf/4.3.3.1

Options were for dm+sm (option 55 pgf90/pgcc) and basic nesting (option 1)

Now I am running wrf and producing wrfout_d01_2000-01-24_12:00:00
What is odd is the extreme variation with different OMP_NUM_THREADS set.
It seems it is best at 4. Any more or less and the time it takes increases.
Setting to 8 is close to the same as setting it to 2
Setting it to 64 and it takes almost 4 times as long as 4..??

Here are some timings:

[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=1
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    2m51.743s
user    2m39.087s
sys     0m12.277s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=2
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    1m49.172s
user    3m15.582s
sys     0m19.015s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=4
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    1m27.357s
user    4m42.111s
sys     0m35.187s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=8
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    1m35.480s
user    8m20.966s
sys     1m13.376s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=16
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    1m52.862s
user    15m43.787s
sys     2m4.978s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=64
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    5m54.857s
user    197m37.807s
sys     7m57.993s


Any ideas as to what would cause that?
I have all but given up on using mpirun as that seems to make it take HOURS no matter how many procs/threads I set. I do see it running 100%cpu on each core it is assigned when I do that, but it rarely writes anything.

Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238


________________________________

This message contains information which may be confidential, proprietary, privileged, or otherwise protected by law from disclosure or use by a third party. If you have received this message in error, please contact us immediately and take the steps necessary to delete the message completely from your computer system. Thank you.

Please visit ERM's web site: http://www.erm.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20150724/2250ebe3/attachment.html 


More information about the Wrf-users mailing list