[Wrf-users] parallel real.exe processes attempting to open namelist.input

Zulauf, Michael Michael.Zulauf at iberdrolaren.com
Thu Aug 29 14:15:21 MDT 2013


Hi all...

 

So, I've got a bit of a problem, and I'm not sure of the best way to fix
it.  I'm just now beginning to experiment with WRF 3.5 - previously
using WRF 3.3.1.  Because of code changes (requiring F2003
capabilities), I've had to switch to newer compiler and MPI versions.
Because of the multiple changes from my previous setup, I'm not totally
sure of where the problem lies.

 

I've gotten things working when I run everything from a shared disk,
visible to all the computational nodes.  Our more usual work flow,
however, runs the model on local disk for the I/O node - which isn't
visible to the other nodes devoted to the job.

 

For previous WRF and MPI versions, this worked fine.  The only oddity is
that the various rsl.error.XXXX and rsl.out.XXXX files from the other
processes got dumped in my home directory (visible to all nodes), which
wasn't a problem.  The files created by processes on the I/O node (with
the local disk) were placed in the directory I run things from.

 

The new WRF/MPI combo failed, initially because the non-I/O nodes
complained that they couldn't go to the work directory.  That's when I
tried running it from a shared disk - which worked fine.  Next I tried
modifying my job script so that it created work directories on the local
disk of the non-I/O nodes.  In this way there is a work directory with
the same name on all nodes, local to all nodes.

 

This time it progressed further, began the real.exe step, and actually
created the rsl.error.XXXX and rsl.out.XXXX files within the local work
directories on all the nodes.  Unfortunately, all processes on the
non-I/O nodes fail immediately.  Here's an example of one of the error
messages in a rsl.error file:

 

                taskid: 12 hostname: compute-0-5.local

               Quilting with   1 groups of   0 I/O tasks.

PGFIO-F-209/OPEN/unit=27/'OLD' specified for file which does not exist.

               File name = namelist.input

In source file module_wrf_error.f90, at line number 38

 

So it appears as if all the processes on the non-I/O nodes are
attempting to open namelist.input, and failing because it's not visible
on their local disk.  This never happened previously, with a different
WRF, compiler, and MPI versions.  I suspect this could be fixed with
either compile or run time options.  But which ones?

 

Any thoughts?  I suspect this might also happen in the wrf.exe stage,
but I haven't gotten that far with this configuration.

 

Thanks,

Mike

 

-- 

Mike Zulauf

Meteorologist, Lead Senior

Operational Meteorology 

Iberdrola Renewables

1125 NW Couch, Suite 700

Portland, OR 97209

Office: 503-478-6304  Cell: 503-913-0403

 


This message is intended for the exclusive attention of the recipient(s) indicated.  Any information contained herein is strictly confidential and privileged.  If you are not the intended recipient, please notify us by return e-mail and delete this message from your computer system. Any unauthorized use, reproduction, alteration, filing or sending of this message and/or any attached files may lead to legal action being taken against the party(ies) responsible for said unauthorized use. Any opinion expressed herein is solely that of the author(s) and does not necessarily represent the opinion of the Company. The sender does not guarantee the integrity, speed or safety of this message, and does not accept responsibility for any possible damage arising from the interception, incorporation of viruses, or any other damage as a result of manipulation.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20130829/b707ff6b/attachment.html 


More information about the Wrf-users mailing list