[mpas-developers] dmpar_abort

Michael Duda duda at ucar.edu
Tue Apr 27 18:01:16 MDT 2010


Hi, Xylar.

I suppose that there are two views a developer could take towards
errors. One approach would be to have the subroutine abort
whenever an error is detected; the other would be to have the
subroutine return an error code and expect the caller to deal with
the error. The latter approach might be more reasonable when the
error is on the part of the caller, while the former seems
reasonable when the error is purely internal to the subroutine.
I've definitely not followed these guidelines in my own code,
especially in the MPAS code, but they were just what occurred to
me after thinking a bit. Also, for short subroutines, the benefit
of returning error codes might be outweighted by the added
complexity of coming up with codes for different errors,
implementing a way for calling code to inquire about error
messages for codes, etc.

In any case, as you point out, there will certainly be places
where we will need a global abort routine; if there are no
objections, I'd propose to add the following subroutine to
src/framework/module_dmpar.F:

   subroutine dmpar_global_abort(mesg)

      implicit none

      character (len=*), intent(in) :: mesg

#ifdef _MPI
      integer :: mpi_ierr, mpi_errcode

      write(0,*) trim(mesg)
      call MPI_Abort(MPI_COMM_WORLD, mpi_errcode, mpi_ierr)
#endif

      write(0,*) trim(mesg)
      stop

   end subroutine dmpar_global_abort

I've given this routine a quick test in the MPAS code, and it
seems to work as expected, both in serial and in parallel. Of
course, adding this routine would have no effect on existing code
(until it is potentially changed to call the routine).

Cheers,
Michael


On Mon, Apr 26, 2010 at 04:25:29PM -0600, Xylar Asay-Davis wrote:
> Hi Michael,
> 
> I can see cases in the future where a global abort would be very helpful.
> 
> Currently, I am using abort as a sanity check on certain flags that are 
> passed into my routines, so an error code would not really be an 
> appropriate way to handle these -- if the flag is invalid than there is 
> a bug in the code that needs to be fixed.  So it seems like calling a 
> global abort function would be reasonable here.
> 
> Some searching reveals the use of stop in several places in the code 
> that we will probably want to replace with
> dmpar_global_abort once it's created.
> 
> I'm happy to write the routine but you may be able to foresee problems 
> that I might miss.  Let me know if you'd like to do it.
> 
>   Thanks for your thoughts on this!
> -Xylar
> 
> On 4/26/10 3:21 PM, Michael Duda wrote:
> > Hi, Xylar.
> >
> > One approach might be to return a status code from routines that
> > might encounter errors, and allow a routine higher up in the call
> > stack to handle the error with a dmpar_abort if it were deemed
> > appropriate. Depending on the nature of the subroutine, this might
> > be the preferable approach -- allow higher-level code to determine
> > whether the error can be recovered from or whether it is fatal.
> > However, this would either entail adding an error code argument to
> > the subroutine, which is one thing we'd like to avoid, or
> > converting the subroutine into a function, which wouldn't be an
> > option if the subroutine was in fact already a function.
> >
> > Another approach, and one that would be very simple to implement,
> > would be to add a dmpar_global_abort(mesg) routine that is
> > callable from any code that uses the dmpar module, and that prints
> > the message mesg before calling MPI_Abort with MPI_COMM_WORLD. The
> > current dmpar_abort only needs the dminfo argument to get the
> > communicator to abort on, and I'd be hard-pressed to find a case
> > where it would be desirable to abort on a communicator other than
> > the global one. Adding a dmpar_global_abort routine would obviate
> > the need to pass dminfo into any subroutine that might need to
> > abort, and adding it as a new subroutine would allow us to migrate
> > from existing calls to dmpar_abort on an as-needed basis.
> >
> > I'd support adding a dmpar_global_abort routine in the dmpar
> > module, but I'd also suggest considering whether the error being
> > checked for is one that can be recovered from, in which case a
> > return error code might be the cleanest approach in that
> > particular case.
> >
> > Cheers,
> > Michael
> >
> >
> > On Mon, Apr 26, 2010 at 02:10:10PM -0600, Xylar Asay-Davis wrote:
> >    
> >> I'm trying to use dmpar_abort as a way to stop the code with an error
> >> message when things go wrong with the code I'm testing.  I could just
> >> use stop, but I figured dmpar_abort was the "proper" way.  The problem
> >> is that dminfo, the argument needed by dmpar_abort, is a member of the
> >> domain, which is not available in many subroutines.  And it's
> >> inconvenient to have to pass around any extra arguments to my
> >> subroutines just in case I might want to abort.
> >>
> >> Any suggestions?
> >>
> >> -Xylar
> >>
> >> -- 
> >>
> >> ***********************
> >> Xylar S. Asay-Davis
> >> E-mail: xylar at lanl.gov
> >> Phone: (505) 606-0025
> >> Fax: (505) 665-2659
> >> CNLS, MS B258
> >> Los Alamos National Laboratory
> >> Los Alamos, NM 87545
> >> ***********************
> >>
> >>
> >> _______________________________________________
> >> mpas-developers mailing list
> >> mpas-developers at mailman.ucar.edu
> >> http://mailman.ucar.edu/mailman/listinfo/mpas-developers
> >>      
> > _______________________________________________
> > mpas-developers mailing list
> > mpas-developers at mailman.ucar.edu
> > http://mailman.ucar.edu/mailman/listinfo/mpas-developers
> >    
> 
> 
> -- 
> 
> ***********************
> Xylar S. Asay-Davis
> E-mail: xylar at lanl.gov
> Phone: (505) 606-0025
> Fax: (505) 665-2659
> CNLS, MS B258
> Los Alamos National Laboratory
> Los Alamos, NM 87545
> ***********************
> 
> 
> _______________________________________________
> mpas-developers mailing list
> mpas-developers at mailman.ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/mpas-developers


More information about the mpas-developers mailing list