[Wrf-users] Optimizing OMP_NUM_THREADS

Wei Huang whuang at univ-wea.com
Fri Jul 24 14:00:20 MDT 2015


Brian,

The wrf domain is too small to do such performance test.

Use a larger, much large domain, you'll a better performance number.

Wei Huang

From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On Behalf Of Andrus, Brian Contractor
Sent: Friday, July 24, 2015 1:52 PM
To: wrf-users at ucar.edu
Subject: [Wrf-users] Optimizing OMP_NUM_THREADS

Hello,

I am a little confused about running wrf. I have built it and am following the testing for the Jan 2000 data set per: http://www2.mmm.ucar.edu/wrf/OnLineTutorial/CASES/JAN00/wrf.htm<https://urldefense.proofpoint.com/v2/url?u=http-3A__www2.mmm.ucar.edu_wrf_OnLineTutorial_CASES_JAN00_wrf.htm&d=BQMFAg&c=qHNyRuJKHYeI-vwTnTfWXq4fkZpyjWUA1LcPL7eQSSQ&r=6RmpIe4P-G8omsHqYS85uBS-3JaNVk3lvOG-hmdOrr8&m=beCRANFC6K_f6XDzLxHv3fxVEyzYKzTanLPeE9nlkY8&s=Nr6hVplJ77yATKxwTJ4fkRR1GIMQ9BvlgSOeUL1AEFY&e=>

Everything does work and I can get proper output, however I have noticed unusual timing that I cannot figure why.
I am running on a system with 64 cores and 256GB RAM.
I compiled with
  1) compile/pgi/15.7     2) mpi/openmpi/1.8.5    3) app/netcdf/4.3.3.1

Options were for dm+sm (option 55 pgf90/pgcc) and basic nesting (option 1)

Now I am running wrf and producing wrfout_d01_2000-01-24_12:00:00
What is odd is the extreme variation with different OMP_NUM_THREADS set.
It seems it is best at 4. Any more or less and the time it takes increases.
Setting to 8 is close to the same as setting it to 2
Setting it to 64 and it takes almost 4 times as long as 4..??

Here are some timings:

[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=1
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    2m51.743s
user    2m39.087s
sys     0m12.277s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=2
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    1m49.172s
user    3m15.582s
sys     0m19.015s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=4
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    1m27.357s
user    4m42.111s
sys     0m35.187s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=8
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    1m35.480s
user    8m20.966s
sys     1m13.376s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=16
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    1m52.862s
user    15m43.787s
sys     2m4.978s
[bdandrus at compute-7-3 em_real]$ export OMP_NUM_THREADS=64
[bdandrus at compute-7-3 em_real]$ time ./wrf.exe
starting wrf task             0  of             1

real    5m54.857s
user    197m37.807s
sys     7m57.993s


Any ideas as to what would cause that?
I have all but given up on using mpirun as that seems to make it take HOURS no matter how many procs/threads I set. I do see it running 100%cpu on each core it is assigned when I do that, but it rarely writes anything.

Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20150724/b1c8b555/attachment-0001.html 


More information about the Wrf-users mailing list