[Wrf-users] Poor scalability of WRFV221 on 8-core machine
Dubtsov, Roman S
roman.s.dubtsov at intel.com
Thu Oct 2 23:38:20 MDT 2008
Erick,
(a) Gerardo was right; you should use MPI so that more computations are
done in parallel. Memory bandwidth is defined by chipset, not CPUS.
OpenMP version is likely to cause large amount of cache coherency
traffic lowering effective data bandwidth for WRF. With MPI cache
coherency traffic is much lower. MPI library needs to support 1)
communications via shared memory and 2) "pinning" MPI processes to CPU
cores so that they do not migrate between the cores/sockets and do not
thrash cache.
(b) In your case it also may make sense experimenting with numtiles
namelist option. Setting it to higher value may improve cache
utilization and lower memory pressure. For CONUS12km-sized domains and 8
MPI processes I suggest trying numtiles=64 first. However, results with
different numtiles settings are not bit-for-bit identical. Also, you can
try experimenting with numtiles even if you use only OpenMP.
Regards,
Roman
:wbr
>-----Original Message-----
>From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On
>Behalf Of Erick van Rijk
>Sent: Friday, October 03, 2008 08:20
>To: Gerardo Cisneros
>Cc: wrf-users at ucar.edu
>Subject: Re: [Wrf-users] Poor scalability of WRFV221 on 8-core machine
>
>My reasoning for using OMP is that my test machine is a single unit
>MPI will only harm in that scenario.
>The overhead of launching separate processes, duplicating the dataset
>for every instance, is considerable and with omp the communication
>latency is lower than MPI.
>
>I agree that the available bandwidth per core declines if you add more
>cores to share the same bus, but I expected that 2 Intel Xeon 5400
>processors could handle that.
>
>Erick
>On Oct 2, 2008, at 5:59 PM, Gerardo Cisneros wrote:
>
>> On Thu, 2 Oct 2008, Erick van Rijk wrote:
>>
>>> Hello everybody,
>>> I have been looking into the scalability of WRFV221 on my 8-core
>>> machine and I have noticed that the scalability is very poor [...]
>>>
>>> Do any of the user/developers want to comment on this? Any reason
why
>>> this is happening or point me to somewhere that can cause this
>>> behaviour?
>>> I have build wrf221 with ifort and openmp enabled (not using MPI).
>>
>> (a) WRF scaling with OpenMP only isn't anywhere
>> near what can be obtained by using MPI.
>>
>> (b) Memory bandwidth per core dwindles as you
>> use more cores in your shared-memory machine.
>>
>> Saludos,
>>
>> Gerardo
>> --
>> Dr. Gerardo Cisneros |SGI (Silicon Graphics, S.A. de C.V.)
>> Scientist |Av. Vasco de Quiroga 3000, Col. Santa
Fe
>> gerardo at sgi.com |01210 Mexico, D.F., MEXICO
>> (+52-55)5563-7958 |http://www.sgi.com/
>>
>
>_______________________________________________
>Wrf-users mailing list
>Wrf-users at ucar.edu
>http://mailman.ucar.edu/mailman/listinfo/wrf-users
--------------------------------------------------------------------
Closed Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the Wrf-users
mailing list