[Wrf-users] WRF performance on quad-core Intel Xeon
Gerardo Cisneros
gerardo at sgi.com
Fri Dec 14 22:03:16 MST 2007
Kenneth,
You wrote:
> I've been doing timing tests with WRF 2.2 on a dual-cpu, quad-core
> Intel Xeon 5355 2.66 GHz system running Linux, and getting some
> puzzling results. [...]
>
> Interestingly, the results are almost identical in making one
> parallel simulation over 1 to 8 cores -- I can't get any advantage
> when using more than 4 cores at a time.
> [...]
> This machine has 4 MB of cache per processor (I think), so 8 MB total
> for the 8 cores. [...]
No, there is a shared 4MB L2 cache per *pair* of cores,
and therein lies a part of your puzzling results. As long
as you're using at most one of each pair of cache-sharing
cores, that core enjoys the full 4MB of L2 cache. As
soon as you place processes on both cores of a pair,
your cache effectively halves.
> Does anyone have an idea why WRF is not scaling up well beyond 2
> cores per processor on this kind of quad-core platform? Could it be
> specific to the Intel quad-core chip? I believe there is an AMD quad-
> core chip, but haven't tested on that. Thanks in advance for any
> thoughts.
Part is the cache-sharing. The other part of the failure
to scale is FSB bandwidth sharing -- memory traffic is
going to suffer, and not only on Intel quad cores, but
on any quad core. Whenever you add processes to the
cores on a given socket, having a fixed bandwidth from
the socket to memory means there has to be less bandwidth
per process to memory, hence performance doesn't scale.
Saludos,
Gerardo
--
Dr. Gerardo Cisneros |SGI (Silicon Graphics, S.A. de C.V.)
Scientist |Av. Vasco de Quiroga 3000, Col. Santa Fe
gerardo at sgi.com |01210 Mexico, D.F., MEXICO
(+52-55)5563-7958 |http://www.sgi.com/
More information about the Wrf-users
mailing list