[mpas-developers] dmpar_abort

Michael Duda duda at ucar.edu
Mon Apr 26 16:10:35 MDT 2010


Hi.

Just to confirm: using some form of dmpar_abort() is definitely
the 'proper' way to abort the code, since a Fortran stop is only
required to stop the invoking process, leaving the others to
potentially keep running. One can imagine a bit of code like

if (dminfo % my_proc_id == 0) then
   compute something
   if (some error code) then
      stop
   end if
   call dmpar_bcast(...)
else
   call dmpar_bcast(...)
end if

that could leave processes other than 0 waiting on the bcast call
if the error condition on process 0 were met; using MPI_Abort
(through dmpar_abort or dmpar_global_abort) would ensure that all
MPI processes get terminated.

Interestingly, the page at
http://www.mcs.anl.gov/research/projects/mpi/www/www3/MPI_Abort.html
indicates that the MPICH implementation of MPI terminates all
processes regardless of which communicator is passed to
MPI_Abort(), further arguing for the addition and use of a
dmpar_global_abort routine in place of dmpar_abort.

Cheers,
Michael


On Mon, Apr 26, 2010 at 03:21:32PM -0600, Michael Duda wrote:
> Hi, Xylar.
> 
> One approach might be to return a status code from routines that
> might encounter errors, and allow a routine higher up in the call
> stack to handle the error with a dmpar_abort if it were deemed
> appropriate. Depending on the nature of the subroutine, this might
> be the preferable approach -- allow higher-level code to determine
> whether the error can be recovered from or whether it is fatal.
> However, this would either entail adding an error code argument to
> the subroutine, which is one thing we'd like to avoid, or
> converting the subroutine into a function, which wouldn't be an
> option if the subroutine was in fact already a function.
> 
> Another approach, and one that would be very simple to implement,
> would be to add a dmpar_global_abort(mesg) routine that is
> callable from any code that uses the dmpar module, and that prints
> the message mesg before calling MPI_Abort with MPI_COMM_WORLD. The
> current dmpar_abort only needs the dminfo argument to get the
> communicator to abort on, and I'd be hard-pressed to find a case
> where it would be desirable to abort on a communicator other than
> the global one. Adding a dmpar_global_abort routine would obviate
> the need to pass dminfo into any subroutine that might need to
> abort, and adding it as a new subroutine would allow us to migrate
> from existing calls to dmpar_abort on an as-needed basis.
> 
> I'd support adding a dmpar_global_abort routine in the dmpar
> module, but I'd also suggest considering whether the error being
> checked for is one that can be recovered from, in which case a
> return error code might be the cleanest approach in that
> particular case.
> 
> Cheers,
> Michael
> 
> 
> On Mon, Apr 26, 2010 at 02:10:10PM -0600, Xylar Asay-Davis wrote:
> > I'm trying to use dmpar_abort as a way to stop the code with an error 
> > message when things go wrong with the code I'm testing.  I could just 
> > use stop, but I figured dmpar_abort was the "proper" way.  The problem 
> > is that dminfo, the argument needed by dmpar_abort, is a member of the 
> > domain, which is not available in many subroutines.  And it's 
> > inconvenient to have to pass around any extra arguments to my 
> > subroutines just in case I might want to abort.
> > 
> > Any suggestions?
> > 
> > -Xylar
> > 
> > -- 
> > 
> > ***********************
> > Xylar S. Asay-Davis
> > E-mail: xylar at lanl.gov
> > Phone: (505) 606-0025
> > Fax: (505) 665-2659
> > CNLS, MS B258
> > Los Alamos National Laboratory
> > Los Alamos, NM 87545
> > ***********************
> > 
> > 
> > _______________________________________________
> > mpas-developers mailing list
> > mpas-developers at mailman.ucar.edu
> > http://mailman.ucar.edu/mailman/listinfo/mpas-developers


More information about the mpas-developers mailing list