[Wrf-users] Trying to run wrfpost.exe on multiple cores

Don Morton Don.Morton at alaska.edu
Thu Jul 1 13:56:28 MDT 2010


Many thanks to Hui-Ya Chuang at EMC/NCEP for help with this.  I will post
some information here (and paste into WRF Users Forum) so that the next time
somebody googles around, they may find something helpful.

1) After inserting a few debug statements, it became apparent that
MPI_Init() simply wasn't working the way it should - each task was only
aware of itself and not any others.  It seems that the default behavior of
the WPP downloaded from DTC is to assume that users don't want to use MPI,
so an MPI stubs library is compiled and linked to.  To get around this, I
just went into WPPV3/sorc/wrfpost/makefile and removed the $(MPILIB) from
the LIBS line, so that mpif90 would link in its own libmpi.a.  This fixed
the initial problem, and Hui-Ya saved me many hours of work by pointing this
out.

2) The other problem I ran into, after wrfpost.exe had run a while, was an
out of bounds array issue in an argument to one of the MPI calls.  This was
in the source file WPPV3/sorc/wrfpost/EXCH.f.  It turns out that at this
location, someone had entered an IBM compiler directive "!@PROCESS NOCHECK"
to get around this problem, but since I'm using PGI on a Linux system, it
was meaningless.  So, there are two places in EXCH.f with that IBM compiler
directive, and using the PGI equivalent of "cpgi$r nobounds" in both
locations alleviated that problem.

wrfpost.exe is now running on multiple cores on the Linux system, and it's
running much faster!

I do need to go in and verify that the resulting GRIB file is a reasonable
approximation of the one obtained by serial wrfpost.exe.

On Wed, Jun 30, 2010 at 3:52 PM, Don Morton <Don.Morton at alaska.edu> wrote:

> The appended is a post I made to the WRF Users Forum on 08 June.  The
> absence of replies there suggests nobody loves me on that forum, so I'll try
> another :)
>
> Since the time of my post, I've also compiled this (using mpif90, etc.) on
> a Penguin Computing cluster of Opteron processors, and am running in the
> same problem.  I've also removed the "PBS Script" interface and am simply
> using PBS to grab an interactive node, then running ./run_wrfpost straight
> from the command line.  My questions are
>
> 1) Are any of you actually running wrfpost.exe in parallel?
> 2) Are there any "gotchas" I might want to be aware of before digging in
> deeper?
>
> Thanks for any help,
>
> Don Morton
> Arctic Region Supercomputing Center
>
> --
> Arctic Region Supercomputing Center
> http://www.arsc.edu/~morton/ <http://www.arsc.edu/%7Emorton/>
>
> ============================================================
>
> Howdy,
>
> After a fair amount of compilation struggles, I managed to compile the
> dmpar version of wrfpost.exe, and am now trying to run wrfpost.exe on a Cray
> XT5 by inserting the following command line in run_wrfpost:
>
> aprun -n 8 ${POSTEXEC}/wrfpost.exe < itag > wrfpost_${domain}.$fhr.out 2>&1
>
> Then, I have run_wrfpost called by a PBS script which allocates 8 cores.
> Although it does execute, what I get for output looks something like:
>
>  we will try to run with 1 server groups
> we will try to run with 1 server groups
> *** you specified 0 I/O servers
> we will try to run with 1 server groups
> we will try to run with 1 server groups
> CHKOUT will write a file
> *** you specified 0 I/O servers
> *** you specified 0 I/O servers
> CHKOUT will write a file
> CHKOUT will write a file
> The Posting is using 1 MPI task
> There are 0 I/O servers
> The Posting is using 1 MPI task
> The Posting is using 1 MPI task
> There are 0 I/O servers
> There are 0 I/O servers
> *** you specified 0 I/O servers
> CHKOUT will write a file
> The Posting is using 1 MPI task
> There are 0 I/O servers
> 0
>
> So, the 8 tasks are launched but
>
> a) Task 7 does not appear to take on the role of an I/O server(the latest
> WRF-ARW user's guide seems to imply that it should?)
> b) It appears that each task is only aware of itself, and not the other
> tasks.
>
> The code actually runs, but takes 9 minutes (1049x1049x51 gridpoints)
> whether I use 4 or 8 tasks.
>
> There are plenty of things I might be doing wrong, and I'm preparing to
> jump into sorc/wrfpost/SETUP_SERVERS.f to start some tracing, but before I
> get in too deep, I'm just wondering if anyone else out there has experience
> in this area and is aware of any "gotchas" that might save me a day or two!
>
> I'm literate in MPI and such, so don't really need a lesson in that aspect.
> If I have to, I'll try to figure out why the call to mpi_comm_size() seems
> to be returning 1 for npes, rather than 8.
>
>


-- 
Arctic Region Supercomputing Center
http://www.arsc.edu/~morton/ <http://www.arsc.edu/%7Emorton/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20100701/8ba000f6/attachment.html 


More information about the Wrf-users mailing list