[Wrf-users] cluster interconnect - Infiniband vs 10 gigabit ethernet

Andrew Robbie (Gmail) andrew.robbie at gmail.com
Mon Apr 18 04:29:48 MDT 2011


On 13/04/2011, at 6:44 AM, Zulauf, Michael wrote:

> Hi all - quick question.  Does anyone have data or experience  
> comparing
> the performance and scaling of WRF with cluster interconnects  
> utilizing
> 10 gigabit ethernet (10GigE) vs Infiniband?

As people have said, the lower latency of IB (especially with the  
latest generation
Mellanox and QLogic HBAs) makes it a better choice.

In our experience the added speed of QDR (quad data rate IB, 40Gbit/s)  
does not
provide much improvement over DDR IB. However, the extra bandwidth is  
really
useful if you plan to use the IB network for storage (NFS/Lustre/PVFS2  
etc).

>  I don't have any
> direct experience with 10GigE, but my experience with 1GigE shows that
> Infiniband scales far better.

Comparing with IB with GigE is not really fair. But IB will scale  
better than 10GigE
in the data center. GigE is easier to maintain as the drivers are  
prepackaged and
just work, which makes it a reasonable choice for small clusters.

> I've seen what I'd consider marketing material on the web that  
> suggests
> that 10GigE is comparable to Infiniband, but they don't specifically
> mention WRF.  On the other hand, I've seen other sites that suggest an
> Infiniband interconnect is far superior.  Again, these don't
> specifically mention WRF.  I know the particular application in use is
> critical when deciding these things, and that WRF is a pretty  
> demanding
> application when it comes to the interconnect.

Some 10GigE switches have impressively low latency, but not as good as  
IB. Also,
in our experience 10GigE switches cost more. Maybe because of the  
market segments
they target like 'Enterprise Switching' rather than HPC.

The main reason to have 10GigE is to connect distributed clusters, as  
sending IB
across a WAN is challenging. Also good to have the storage nodes with  
IB and 10GigE,
IB for inside the cluster and 10GigE to serve results to external  
clients (eg desktop
workstations). That is, 10GigE is the interface to the corporate IT  
network, via the head
node and the file server nodes.

> My suspicion is that Infiniband is still significantly superior, but  
> if
> I'm going to be able to make any headway with IT, then I'll probably
> need some type of numbers to back up my arguments.

Are they going to set up and manage your cluster? Tune MPI to run on  
10GigE? Will
they provide money to pay for a more expensive 10GigE solution and  
make the cluster
30% bigger to make up for the slower 10GigE? I doubt it. Offer them a  
10GigE pipe
to connect to their systems and they will probably be happy.

Andrew



More information about the Wrf-users mailing list