[Wrf-users] Optimizing OMP_NUM_THREADS
Wei Huang
whuang at univ-wea.com
Fri Jul 24 14:00:20 MDT 2015
Brian,
The wrf domain is too small to do such performance test.
Use a larger, much large domain, you'll a better performance number.
Wei Huang
From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On Behalf Of Andrus, Brian Contractor
Sent: Friday, July 24, 2015 1:52 PM
To: wrf-users at ucar.edu
Subject: [Wrf-users] Optimizing OMP_NUM_THREADS
Hello,
I am a little confused about running wrf. I have built it and am following the testing for the Jan 2000 data set per: http://www2.mmm.ucar.edu/wrf/OnLineTutorial/CASES/JAN00/wrf.htm<https://urldefense.proofpoint.com/v2/url?u=http-3A__www2.mmm.ucar.edu_wrf_OnLineTutorial_CASES_JAN00_wrf.htm&d=BQMFAg&c=qHNyRuJKHYeI-vwTnTfWXq4fkZpyjWUA1LcPL7eQSSQ&r=6RmpIe4P-G8omsHqYS85uBS-3JaNVk3lvOG-hmdOrr8&m=beCRANFC6K_f6XDzLxHv3fxVEyzYKzTanLPeE9nlkY8&s=Nr6hVplJ77yATKxwTJ4fkRR1GIMQ9BvlgSOeUL1AEFY&e=>
Everything does work and I can get proper output, however I have noticed unusual timing that I cannot figure why.
I am running on a system with 64 cores and 256GB RAM.
I compiled with
1) compile/pgi/15.7 2) mpi/openmpi/1.8.5 3) app/netcdf/4.3.3.1
Options were for dm+sm (option 55 pgf90/pgcc) and basic nesting (option 1)
Now I am running wrf and producing wrfout_d01_2000-01-24_12:00:00
What is odd is the extreme variation with different OMP_NUM_THREADS set.
It seems it is best at 4. Any more or less and the time it takes increases.
Setting to 8 is close to the same as setting it to 2
Setting it to 64 and it takes almost 4 times as long as 4..??
Here are some timings:
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=1
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 2m51.743s
user 2m39.087s
sys 0m12.277s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=2
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m49.172s
user 3m15.582s
sys 0m19.015s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=4
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m27.357s
user 4m42.111s
sys 0m35.187s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=8
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m35.480s
user 8m20.966s
sys 1m13.376s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=16
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 1m52.862s
user 15m43.787s
sys 2m4.978s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=64
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task 0 of 1
real 5m54.857s
user 197m37.807s
sys 7m57.993s
Any ideas as to what would cause that?
I have all but given up on using mpirun as that seems to make it take HOURS no matter how many procs/threads I set. I do see it running 100%cpu on each core it is assigned when I do that, but it rarely writes anything.
Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20150724/b1c8b555/attachment-0001.html
More information about the Wrf-users
mailing list