[Wrf-users] Optimizing OMP_NUM_THREADS
Surya Ramaswamy
Surya.Ramaswamy at erm.com
Fri Jul 24 14:02:29 MDT 2015
Hi Brian - Depending upon the processor, application and task, the optimum number of processors varies while performing the parallelization. In your case, the optimum number of processors require to perform your task is 4, going anything over 4 processors takes too much of overhead and lose performance during parallelization.
Regards,
Surya Ramaswamy
ERM
75 Valley Stream Parkway Suite 200
Malvern, PA 19355
484-913-0300 (main)
484-913-0301 (fax)
From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On Behalf Of Andrus, Brian Contractor
Sent: Friday, July 24, 2015 2:52 PM
To: wrf-users at ucar.edu
Subject: [Wrf-users] Optimizing OMP_NUM_THREADS
Hello,
I am a little confused about running wrf. I have built it and am following the testing for the Jan 2000 data set per: http://www2.mmm.ucar.edu/wrf/OnLineTutorial/CASES/JAN00/wrf.htm
Everything does work and I can get proper output, however I have noticed unusual timing that I cannot figure why.
I am running on a system with 64 cores and 256GB RAM.
I compiled with
1) compile/pgi/15.7 2) mpi/openmpi/1.8.5 3) app/netcdf/4.3.3.1
Options were for dm+sm (option 55 pgf90/pgcc) and basic nesting (option 1)
Now I am running wrf and producing wrfout_d01_2000-01-24_12:00:00
What is odd is the extreme variation with different OMP_NUM_THREADS set.
It seems it is best at 4. Any more or less and the time it takes increases.
Setting to 8 is close to the same as setting it to 2
Setting it to 64 and it takes almost 4 times as long as 4..??
Here are some timings:
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=1
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 2m51.743s
user 2m39.087s
sys 0m12.277s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=2
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m49.172s
user 3m15.582s
sys 0m19.015s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=4
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m27.357s
user 4m42.111s
sys 0m35.187s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=8
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m35.480s
user 8m20.966s
sys 1m13.376s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=16
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m52.862s
user 15m43.787s
sys 2m4.978s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=64
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 5m54.857s
user 197m37.807s
sys 7m57.993s
Any ideas as to what would cause that?
I have all but given up on using mpirun as that seems to make it take HOURS no matter how many procs/threads I set. I do see it running 100%cpu on each core it is assigned when I do that, but it rarely writes anything.
Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238
________________________________
This message contains information which may be confidential, proprietary, privileged, or otherwise protected by law from disclosure or use by a third party. If you have received this message in error, please contact us immediately and take the steps necessary to delete the message completely from your computer system. Thank you.
Please visit ERM's web site: http://www.erm.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20150724/2250ebe3/attachment.html
More information about the Wrf-users
mailing list