[Wrf-users] Re: Wrf-users Digest, Vol 20, Issue 1

Wed Apr 5 14:49:57 MDT 2006

Hi,

Some comments were provided for this enquiry in WRF Users Forum page.
For more info visit: 

http://tornado.meso.com/wrf_forum/index.php?showtopic=430

Hope this help,

--Wrfhelp

On Mon, Apr 03, 2006 at 12:00:04PM -0600, wrf-users-request at ucar.edu wrote:
> Send Wrf-users mailing list submissions to
> 	wrf-users at ucar.edu
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://mailman.ucar.edu/mailman/listinfo/wrf-users
> or, via email, send a message with subject or body 'help' to
> 	wrf-users-request at ucar.edu
> 
> You can reach the person managing the list at
> 	wrf-users-owner at ucar.edu
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wrf-users digest..."
> 
> 
> Today's Topics:
> 
>    1. mpirun giving unexpected results (Brian.Hoeth at noaa.gov)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 03 Apr 2006 12:01:14 -0500
> From: Brian.Hoeth at noaa.gov
> Subject: [Wrf-users] mpirun giving unexpected results
> To: wrf-users at ucar.edu
> Message-ID: <59e10599da.599da59e10 at noaa.gov>
> Content-Type: text/plain; charset=us-ascii
> 
> Hello,
> 
> The post below was sent to the online WRF Users Forum by one of our 
> software support group members (Brice), so I will just cut and paste 
> the post here to see if we get any replies here also.
> 
> Thanks,
> Brian Hoeth
> Spaceflight Meteorology Group
> Johnson Space Center 
> Houston, TX
> 281-483-3246
> 
> 
> 
> The Spaceflight Meteorology Group here at Johnson Space Center has 
> recently acquired a small Linux-based cluster to run the WRF-NMM in 
> support of Space Shuttle operations. I am the software support lead 
> and have been running some 'bench' testing on the system. The results 
> of the tests have raised some questions that I would appreciate help 
> in answering.
> 
> I may not have the exact details of the configuration of the model run 
> here, but the SMG folks will probably supply that if more information 
> is needed. The testing involved running the WRF-NMM at a 4km 
> resolution over an area around New Mexico, using the real data test 
> case, downloaded from the WRF-NMM user's site.
> 
> The cluster is composed of a head node with dual hyper-threading Intel 
> Xeons at 3.2GHz and 16 subnodes with dual Intel Xeons at 3.2GHz. All 
> of the subnodes mount the headnodes home drive. Communications between 
> the nodes is via Gigabit Ethernet.
> 
> The WRF-NMM package was installed using the PGI CDK 6.0 as was MPICH 
> and netCDF. One thing that I ran into in the installation was 
> differences between what I started out installing using the 32-bit PGI 
> and then attempting to install the WRF, which chose to have itself 
> installed using the 64-bit. That was corrected and all of the software 
> packages associated with the model (MPICH, netCDF, real-nmm.exe and 
> wrf.exe) are compiled with 64-bit support. The head node is running 
> RHEL AS 3.4 and the compute nodes are running RHEL WS 3.4.
> 
> Ok. That's the basic background to jump past all of those questions. 
> Additional information is that I have not tried any of the debugging 
> tools yet; I am using /usr/bin/time -v to gather timing data; and I am 
> not using any scheduling applications, such as OPENPBS, just mpirun 
> and various combinations of machine and process files. I have the time 
> results and the actual command lines captured and can supply that if 
> someone needs that. Last bit of 'background' is that I am not a long 
> term cluster development programmer (20+years programming in FORTRAN 
> and other things, but not clusters), nor a heavy Linux administrator ( 
> though that is changing rapidly and several years experience in HPUX 
> administration). So now you know some measure of how many questions I 
> will ask before I understand the answers I get ;-) The SMG has had a 
> Beowulf cluster for a couple of years, but my group was giving it 
> minimal admin support. So I, like any good programmer, am looking 
> for 'prior art' and experience.
> 
> Here are some of the summarized results and then I will get the 
> questions:
> 
> WRF-NMM run with 1 process on head node and 31 processes on subnodes
> 'mpirun -np 32 ./wrf.exe'
> 13:21.32 wall time (all times from the headnode perspective)
> 
> WRF-NMM run with 3 processes on head node and 32 processes on subnodes
> 'mpirun -p4pg PI-35proc ./wrf.exe'
> 13:53.70 wall time
> 
> WRF-NMM run with 1 process on head node and 15 processes on subnodes
> 'mpirun -np 16 ./wrf.exe'
> 14:09.29 wall time
> 
> WRF-NMM run with 1 process on head node and 7 processes on subnodes
> 'mpirun -np 8 ./wrf.exe'
> 20:08.88 wall time
> 
> WRF-NMM run with NO processes on head node and 16 processes on subnodes
> 'mpirun -np 16 -nolocal -machinefile wrf-16p.machines ./wrf.exe'
> 1:36:56 - an hour and a half of wall time
> 
> and finally, dual runs of the model with 1 process each on the head 
> node and 15 processes pushed out to separate banks of the compute nodes
> 
> 'mpirun -np 16 -machinefile wrf-16p-plushead.machines ./wrf.exe'
> 17:27.70 wall time
> 'mpirun -np 16 -machinefile wrf-16p-test2.machines ./wrf.exe'
> 17:08.21 wall time
> 
> The results that call questions are the minimal difference between 16 
> and 32 processes, and, in fact, 8 processes and the huge difference in 
> putting no processes on the head node. Taking the last case first, my 
> thought, based on some web research is that possibly the difference 
> between NFS and local writes could be influencing the time, but 
> question maybe a shared memory issue?
> 
> Going back to the base issue of the number of processes influence. 
> Does anyone have other experiences with the scaling of the WRF to 
> larger or smaller clusters (I did note one in an earlier post, but I 
> am unsure what to make of the results at this point)? And I did look 
> at the graph that was referred to, but we are a much smaller shop than 
> most of the tests there. Can anybody suggest some tuning that might be 
> useful or a tool that would assist in gaining a better understanding 
> of what is going on and what to expect if(when) the users expand their 
> activities?
> 
> Pardon the length of this post, but I figured it was better to get out 
> as many details up front as possible.
> 
> Thanks,
> 
> Brice 
> 
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Wrf-users mailing list
> Wrf-users at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/wrf-users
> 
> 
> End of Wrf-users Digest, Vol 20, Issue 1
> ****************************************