[Wrf-users] WRF performance on quad-core Intel Xeon
Kenneth Waight
ken at meso.com
Fri Dec 14 19:40:15 MST 2007
I've been doing timing tests with WRF 2.2 on a dual-cpu, quad-core
Intel Xeon 5355 2.66 GHz system running Linux, and getting some
puzzling results. I'm running from 1 to 8 individual simulations
simultaneously, hoping to be able to make 8 individual runs on 8
cores efficiently. A short test run scales up fairly well going from
1 to 4 cores, but the runs slow down dramatically going from 4 to 8
cores, so that the throughput increases hardly at all:
Number of simulations Avg time for each run (s) Acceleration
(tot sim time/cpu
time)
--------------------- -------------------------
-----------------------
1 1109 13.0
2 1244 23.1
4 1496 38.5
8 2864 40.0
Interestingly, the results are almost identical in making one
parallel simulation over 1 to 8 cores -- I can't get any advantage
when using more than 4 cores at a time.
The numbers are not sensitive to the Fortran compiler (PGI 6, PGI 7,
Intel 10), or to the compiler options (-fast, -O3, -xT, etc.) or to
the configuration (single threaded vs. MPICH, etc.).
This machine has 4 MB of cache per processor (I think), so 8 MB total
for the 8 cores. The numbers are also not sensitive to the domain
size -- the performance is flat between 4 and 8 cores even for very
small grids, so it's not a case of using enough memory to cause
paging. The modest domain (100x100x28) which produced the numbers
above uses about 4% of the total system memory for each run
(according to the "top" command).
Does anyone have an idea why WRF is not scaling up well beyond 2
cores per processor on this kind of quad-core platform? Could it be
specific to the Intel quad-core chip? I believe there is an AMD quad-
core chip, but haven't tested on that. Thanks in advance for any
thoughts.
Ken
More information about the Wrf-users
mailing list