[Wrf-users] Optimizing OMP_NUM_THREADS
Andrus, Brian Contractor
bdandrus at nps.edu
Fri Jul 24 12:51:37 MDT 2015
Hello,
I am a little confused about running wrf. I have built it and am following the testing for the Jan 2000 data set per: http://www2.mmm.ucar.edu/wrf/OnLineTutorial/CASES/JAN00/wrf.htm
Everything does work and I can get proper output, however I have noticed unusual timing that I cannot figure why.
I am running on a system with 64 cores and 256GB RAM.
I compiled with
1) compile/pgi/15.7 2) mpi/openmpi/1.8.5 3) app/netcdf/4.3.3.1
Options were for dm+sm (option 55 pgf90/pgcc) and basic nesting (option 1)
Now I am running wrf and producing wrfout_d01_2000-01-24_12:00:00
What is odd is the extreme variation with different OMP_NUM_THREADS set.
It seems it is best at 4. Any more or less and the time it takes increases.
Setting to 8 is close to the same as setting it to 2
Setting it to 64 and it takes almost 4 times as long as 4..??
Here are some timings:
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=1
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 2m51.743s
user 2m39.087s
sys 0m12.277s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=2
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m49.172s
user 3m15.582s
sys 0m19.015s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=4
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m27.357s
user 4m42.111s
sys 0m35.187s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=8
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m35.480s
user 8m20.966s
sys 1m13.376s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=16
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m52.862s
user 15m43.787s
sys 2m4.978s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=64
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 5m54.857s
user 197m37.807s
sys 7m57.993s
Any ideas as to what would cause that?
I have all but given up on using mpirun as that seems to make it take HOURS no matter how many procs/threads I set. I do see it running 100%cpu on each core it is assigned when I do that, but it rarely writes anything.
Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20150724/aa90150d/attachment.html
More information about the Wrf-users
mailing list