[Wrf-users] Poor scalability of WRFV221 on 8-core machine

Daniel van Dijke D.vanDijke at weer.nl
Tue Oct 7 03:31:20 MDT 2008


Hi,

It's an Intel Quad core issue, the FSB is too small for all the communication that is going to the cores (both for openMP and MPI). All the people that I talked to have problems with these CPU's, not only with WRF, but with all kinds of models. Since these Quad cores are cheaper than dual cores it's still better to use them, although they hardly add any performance compared to the dual cores. 
An option would be to use AMD Barcelona, the manufacturer says that it should scale much better, this because the RAM is directly connected to the core. I didn't test it myself and some people complain about these CPUs too. 
So unfortunately I don't think this problem can be solve (for now).
Cheers,

Daniël

-----Oorspronkelijk bericht-----
Van: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] Namens Erick van Rijk
Verzonden: vrijdag 3 oktober 2008 19:20
Aan: Dubtsov, Roman S
CC: wrf-users at ucar.edu
Onderwerp: Re: [Wrf-users] Poor scalability of WRFV221 on 8-core machine

Roman,
Yes, the chipsets define the memory bandwidth, (hence I wrote 5400).
Are you saying that MPI performs better than OpenMP on a single  
machine? Could you explain further why the memory traffic would be  
lower than openmp? The same amount of communication needs to be done  
for the sharing of the data.
Nothing I have tested points to that and so does Mr. Fovell's tests http://macwrf.blogspot.com/2008/03/wrfv221-on-dual-quad-mac-pro.html
I use a similar machine as he did for his tests.

Erick

On Oct 2, 2008, at 10:38 PM, Dubtsov, Roman S wrote:

> Erick,
>
> (a) Gerardo was right; you should use MPI so that more computations  
> are
> done in parallel. Memory bandwidth is defined by chipset, not CPUS.
> OpenMP version is likely to cause large amount of cache coherency
> traffic lowering effective data bandwidth for WRF. With MPI cache
> coherency traffic is much lower. MPI library needs to support 1)
> communications via shared memory and 2) "pinning" MPI processes to CPU
> cores so that they do not migrate between the cores/sockets and do not
> thrash cache.
>
> (b) In your case it also may make sense experimenting with numtiles
> namelist option. Setting it to higher value may improve cache
> utilization and lower memory pressure. For CONUS12km-sized domains  
> and 8
> MPI processes I suggest trying numtiles=64 first. However, results  
> with
> different numtiles settings are not bit-for-bit identical. Also, you  
> can
> try experimenting with numtiles even if you use only OpenMP.
>
> Regards,
> Roman
> :wbr
>
>> -----Original Message-----
>> From: wrf-users-bounces at ucar.edu [mailto:wrf-users- 
>> bounces at ucar.edu] On
>> Behalf Of Erick van Rijk
>> Sent: Friday, October 03, 2008 08:20
>> To: Gerardo Cisneros
>> Cc: wrf-users at ucar.edu
>> Subject: Re: [Wrf-users] Poor scalability of WRFV221 on 8-core  
>> machine
>>
>> My reasoning for using OMP is that my test machine is a single unit
>> MPI will only harm in that scenario.
>> The overhead of launching separate processes, duplicating the dataset
>> for every instance, is considerable and with omp the communication
>> latency is lower than MPI.
>>
>> I agree that the available bandwidth per core declines if you add  
>> more
>> cores to share the same bus, but I expected that 2 Intel Xeon 5400
>> processors could handle that.
>>
>> Erick
>> On Oct 2, 2008, at 5:59 PM, Gerardo Cisneros wrote:
>>
>>> On Thu, 2 Oct 2008, Erick van Rijk wrote:
>>>
>>>> Hello everybody,
>>>> I have been looking into the scalability of WRFV221 on my 8-core
>>>> machine and I have noticed that the scalability is very poor [...]
>>>>
>>>> Do any of the user/developers want to comment on this? Any reason
> why
>>>> this is happening or point me to somewhere that can cause this
>>>> behaviour?
>>>> I have build wrf221 with ifort and openmp enabled (not using MPI).
>>>
>>> (a)  WRF scaling with OpenMP only isn't anywhere
>>> near what can be obtained by using MPI.
>>>
>>> (b)  Memory bandwidth per core dwindles as you
>>> use more cores in your shared-memory machine.
>>>
>>> Saludos,
>>>
>>> Gerardo
>>> --
>>> Dr. Gerardo Cisneros	|SGI (Silicon Graphics, S.A. de C.V.)
>>> Scientist             	|Av. Vasco de Quiroga 3000, Col. Santa
> Fe
>>> gerardo at sgi.com		|01210 Mexico, D.F., MEXICO
>>> (+52-55)5563-7958 	|http://www.sgi.com/
>>>
>>
>> _______________________________________________
>> Wrf-users mailing list
>> Wrf-users at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/wrf-users
>
> --------------------------------------------------------------------
> Closed Joint Stock Company Intel A/O
> Registered legal address: Krylatsky Hills Business Park,
> 17 Krylatskaya Str., Bldg 4, Moscow 121614,
> Russian Federation
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>

_______________________________________________
Wrf-users mailing list
Wrf-users at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/wrf-users


-- 

This e-mail is from Meteo Consult B.V., a MeteoGroup company. For more information, see http://www.weer.nl/gebruiksvoorwaarden.

This e-mail may contain confidential information. Only the addressee is permitted to read, copy, distribute or otherwise use this e-mail or any attachments. If you have received it in error, please contact the sender immediately. Any opinion expressed in this e-mail is personal to the sender and may not reflect the opinion of MeteoGroup.

Any e-mail reply to this address may be subject to interception or monitoring for operational reasons or for lawful business practices.


More information about the Wrf-users mailing list