[Wrf-users] WRF performance on quad-core Intel Xeon

Gerardo Cisneros gerardo at sgi.com
Fri Dec 14 22:03:16 MST 2007


Kenneth,

You wrote:
> I've been doing timing tests with WRF 2.2 on a dual-cpu, quad-core  
> Intel Xeon 5355 2.66 GHz system running Linux, and getting some  
> puzzling results.  [...]
> 
> Interestingly, the results are almost identical in making one  
> parallel simulation over 1 to 8 cores -- I can't get any advantage  
> when using more than 4 cores at a time.
> [...]
> This machine has 4 MB of cache per processor (I think), so 8 MB total  
> for the 8 cores.  [...]

No, there is a shared 4MB L2 cache per *pair* of cores,
and therein lies a part of your puzzling results.  As long
as you're using at most one of each pair of cache-sharing
cores, that core enjoys the full 4MB of L2 cache.  As
soon as you place processes on both cores of a pair,
your cache effectively halves.

> Does anyone have an idea why WRF is not scaling up well beyond 2  
> cores per processor on this kind of quad-core platform?  Could it be  
> specific to the Intel quad-core chip?  I believe there is an AMD quad- 
> core chip, but haven't tested on that.  Thanks in advance for any  
> thoughts.

Part is the cache-sharing.  The other part of the failure
to scale is FSB bandwidth sharing -- memory traffic is
going to suffer, and not only on Intel quad cores, but
on any quad core.  Whenever you add processes to the
cores on a given socket, having a fixed bandwidth from
the socket to memory means there has to be less bandwidth
per process to memory, hence performance doesn't scale.

Saludos,

Gerardo
-- 
Dr. Gerardo Cisneros	|SGI (Silicon Graphics, S.A. de C.V.)
Scientist             	|Av. Vasco de Quiroga 3000, Col. Santa Fe
gerardo at sgi.com		|01210 Mexico, D.F., MEXICO
(+52-55)5563-7958 	|http://www.sgi.com/




More information about the Wrf-users mailing list