[Wrf-users] Speed improvement through SSD hard drives?

Hein Zelle hein.zelle at bmtargoss.com
Tue Dec 7 01:40:22 MST 2010


Jonas Kaufmann wrote:

> I am thinking about getting a new server for my WRF model
> computations, and I am wondering about the hardware specs I should use
> for that. Obviously the most important thing is CPU power, but I am
> wondering what to do about harddrives in general. I know that SSD
> drives can give a significant performance boost for I/O tasks, so I am
> thinking about using those drives.
> 
> Has anyone already tried this and if so, what were your results
> compared to normal harddrives? If you did not try this, do you think
> the WRF performance will be affected by this?

I have not tried SSD drives, but I can tell you our experiences with
WRF bottlenecks on a 64+ CPU cluster.  We used to run on a 64 core
cluster, 8 nodes each with 2x 4-core intel xeon cpu's.  The front ends each
had a raid array (HP) with 8 SAS drives.  Performance of those arrays
is relatively pathetic: 80Mb/s sustained read/write speeds (megabytes
per second).

On this system the bottleneck was NOT I/O, strangely enough: it was
memory bandwidth.  Above 32 cores WRF scaled badly.  I/O speeds for a
single model simulation were quite acceptable.  What we did notice was
that it's easy to lock up the server with the disk pack: under write
loads, the server would easily become irresponsive.  This was a
combination of raid controller, linux kernel version (2.6.32+ is much
improved), raid setup (raid 5 is BAD here), file system (ext3 with
journalling).

We eventually switched to RAID 1 with ext2, which did not improve the
throughput but the front end did not lock up anymore.


Our new setup uses Nehalem CPU's in a blade configuration, 64 cores
(again, 2 quad core cpu's per motherboard).  Using this cluster the
model scales much better, the memory bandwidth problems have largely
gone away.  Now, running multiple models at once, it was all too easy
to overload the disk pack server.  A single model simulation would
perform fine, but 3 or more would completely lock up the NFS server
with huge wait loads.

We have moved to a new raid server with 12 SATA drives, hardware RAID
1/0, a more "beefy" raid controller card, 2 network cards in parallel
(200 Mb/s throughput limit).  Linux 2.6.32 kernel, ubuntu 10.04.  We can
still lock up the server by running 10 models (not all WRF) in
parallel, but it's much harder to reach the limit.  This server has a
read/write performance of about 400-500 Mb/s sustained.  


So, summarizing, if you're going to upgrade your server and can afford it:

- use Nehalem cpu's or better  (I believe that's 5500 series or up,
  but please verify that)
- memory bandwidth is a critical factor for WRF, older intel CPU's
  perform much worse.
- disk I/O only becomes a bottle neck for very large models or
  multiple at once (that's our experience, at least)
- use a linux kernel of at least 2.6.32 or better.
- test your disk performance in several configurations!
  You can get huge gains with the right raid/filesystem configuration.
- ext4 seems to work well, so far.

For a small server, I think a raid array (e.g. 1/0) of a couple of
SATA disks is fine.  For a large cluster you migth want to consider
heavier options.  Keep in mind that your I/O bandwidth will not help
once you exceed your network bandwidth (assuming you have a networked
cluster).  It may well be worth getting one or two SSD disks instead
of multiple SATA drives, if you can achieve the same performance.  

Hope that helps,
Kind regards

     Hein Zelle

-- 

Dr. Hein Zelle
Advisor Meteorology & Oceanography

Tel:    +31 (0)527-242299
Fax:    +31 (0)527-242016
Email:  hein.zelle at bmtargoss.com
Web:    www.bmtargoss.com

BMT ARGOSS
P.O. Box 61, 8325 ZH Vollenhove
Voorsterweg 28, 8316 PT Marknesse
The Netherlands

----Confidentiality Notice & Disclaimer---- 

The contents of this e-mail and any attachments are intended for the
use of the mail addressee(s) shown. If you are not that person, you
are not allowed to read it, to take any action based upon it or to
copy it, forward, distribute or disclose the contents of it and you
should please delete it from your system. BMT ARGOSS does not accept
liability for any errors or omissions in the context of this e-mail or
its attachments which arise as a result of internet transmission, nor
accept liability for statements which are those of the author and
clearly not made on behalf of BMT ARGOSS.


More information about the Wrf-users mailing list