Many thanks to Hui-Ya Chuang at EMC/NCEP for help with this. I will post some information here (and paste into WRF Users Forum) so that the next time somebody googles around, they may find something helpful. <br><br>1) After inserting a few debug statements, it became apparent that MPI_Init() simply wasn't working the way it should - each task was only aware of itself and not any others. It seems that the default behavior of the WPP downloaded from DTC is to assume that users don't want to use MPI, so an MPI stubs library is compiled and linked to. To get around this, I just went into WPPV3/sorc/wrfpost/makefile and removed the $(MPILIB) from the LIBS line, so that mpif90 would link in its own libmpi.a. This fixed the initial problem, and Hui-Ya saved me many hours of work by pointing this out.<br>
<br>2) The other problem I ran into, after wrfpost.exe had run a while, was an out of bounds array issue in an argument to one of the MPI calls. This was in the source file WPPV3/sorc/wrfpost/EXCH.f. It turns out that at this location, someone had entered an IBM compiler directive "!@PROCESS NOCHECK" to get around this problem, but since I'm using PGI on a Linux system, it was meaningless. So, there are two places in EXCH.f with that IBM compiler directive, and using the PGI equivalent of "cpgi$r nobounds" in both locations alleviated that problem.<br>
<br>wrfpost.exe is now running on multiple cores on the Linux system, and it's running much faster!<br><br>I do need to go in and verify that the resulting GRIB file is a reasonable approximation of the one obtained by serial wrfpost.exe.<br>
<br><div class="gmail_quote">On Wed, Jun 30, 2010 at 3:52 PM, Don Morton <span dir="ltr"><<a href="mailto:Don.Morton@alaska.edu" target="_blank">Don.Morton@alaska.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
The appended is a post I made to the WRF Users Forum on 08 June. The absence of replies there suggests nobody loves me on that forum, so I'll try another :)<br><br>Since the time of my post, I've also compiled this (using mpif90, etc.) on a Penguin Computing cluster of Opteron processors, and am running in the same problem. I've also removed the "PBS Script" interface and am simply using PBS to grab an interactive node, then running ./run_wrfpost straight from the command line. My questions are<br>
<br>1) Are any of you actually running wrfpost.exe in parallel?<br>2) Are there any "gotchas" I might want to be aware of before digging in deeper?<br><br>Thanks for any help,<br><br>Don Morton<br>Arctic Region Supercomputing Center<br clear="all">
<br>-- <br>Arctic Region Supercomputing Center<br><a href="http://www.arsc.edu/%7Emorton/" target="_blank">http://www.arsc.edu/~morton/</a><br><br>============================================================<br><br>Howdy,<br>
<br>After a fair amount of compilation struggles, I managed to
compile the dmpar version of wrfpost.exe, and am now trying to run
wrfpost.exe on a Cray XT5 by inserting the following command line in
run_wrfpost:<br><br>aprun -n 8 ${POSTEXEC}/wrfpost.exe < itag >
wrfpost_${domain}.$fhr.out 2>&1<br><br>Then, I have run_wrfpost
called by a PBS script which allocates 8 cores. Although it does
execute, what I get for output looks something like:<br><br> <span style="color: rgb(255, 64, 64);"> we will try to run with 1
server groups<br> we will try to run with 1 server groups<br>
*** you specified 0 I/O servers <br> we will try to run with
1 server groups<br> we will try to run with 1 server
groups<br> CHKOUT will write a file<br> *** you specified 0 I/O
servers <br> *** you specified 0 I/O servers <br> CHKOUT will write a
file<br> CHKOUT will write a file<br> The Posting is using
1 MPI task<br> There are 0 I/O servers<br> The Posting
is using 1 MPI task<br> The Posting is using 1
MPI task<br> There are 0 I/O servers<br> There are
0 I/O servers<br> *** you specified 0 I/O servers <br> CHKOUT
will write a file<br> The Posting is using 1 MPI task<br>
There are 0 I/O servers<br> 0</span><br><br>So, the 8
tasks are launched but<br><br>a) Task 7 does not appear to take on the
role of an I/O server(the latest WRF-ARW user's guide seems to imply
that it should?)<br>b) It appears that each task is only aware of
itself, and not the other tasks. <br><br>The code actually runs, but
takes 9 minutes (1049x1049x51 gridpoints) whether I use 4 or 8 tasks.<br><br>There
are plenty of things I might be doing wrong, and I'm preparing to jump
into sorc/wrfpost/SETUP_SERVERS.f to start some tracing, but before I
get in too deep, I'm just wondering if anyone else out there has
experience in this area and is aware of any "gotchas" that might save me
a day or two!<br><br>I'm literate in MPI and such, so don't really need
a lesson in that aspect. If I have to, I'll try to figure out why the
call to mpi_comm_size() seems to be returning 1 for npes, rather than 8.<br><br>
</blockquote></div><br><br clear="all"><br>-- <br>Arctic Region Supercomputing Center<br><a href="http://www.arsc.edu/%7Emorton/" target="_blank">http://www.arsc.edu/~morton/</a><br>