[mpas-developers] dmpar_abort

Philip Jones pwjones at lanl.gov
Mon Apr 26 15:53:54 MDT 2010


Michael,

This is what I was implementing in POP before we switched
to mpas - all routines were passing an error code up to
the calling routine and the program was only terminated at
the highest level.  It's nice from a component standpoint
because it lets everyone shut down cleanly in response to
an error code.  But...you have to treat
threaded regions a bit carefully and there's always a chance
of hanging MPI if one mpi task returns an error and no one
else does.  In MPI-specific routines, you might want to abort
if you're between a send/recv pair or something and only use
this error approach elsewhere.

Anyway, if you're interested in that approach, I have an
error module that keeps what amounts to an internal stack
trace as errors are propagated upward.  Before exiting
a routine with the error code, you can call
    call POP_ErrorSet(errorCode, rtnName, errMsg)
and then return.  And in the calling routine, you can do
    if (POP_ErrorCheck(errorCode, rtnName, errMsg)) return
The error module keeps track of all the errMsg's to form
an error trace that is output with an ErrorPrint call.

Phil

On Apr 26, 2010, at 3:21 PM, Michael Duda wrote:

> Hi, Xylar.
>
> One approach might be to return a status code from routines that
> might encounter errors, and allow a routine higher up in the call
> stack to handle the error with a dmpar_abort if it were deemed
> appropriate. Depending on the nature of the subroutine, this might
> be the preferable approach -- allow higher-level code to determine
> whether the error can be recovered from or whether it is fatal.
> However, this would either entail adding an error code argument to
> the subroutine, which is one thing we'd like to avoid, or
> converting the subroutine into a function, which wouldn't be an
> option if the subroutine was in fact already a function.
>
> Another approach, and one that would be very simple to implement,
> would be to add a dmpar_global_abort(mesg) routine that is
> callable from any code that uses the dmpar module, and that prints
> the message mesg before calling MPI_Abort with MPI_COMM_WORLD. The
> current dmpar_abort only needs the dminfo argument to get the
> communicator to abort on, and I'd be hard-pressed to find a case
> where it would be desirable to abort on a communicator other than
> the global one. Adding a dmpar_global_abort routine would obviate
> the need to pass dminfo into any subroutine that might need to
> abort, and adding it as a new subroutine would allow us to migrate
> from existing calls to dmpar_abort on an as-needed basis.
>
> I'd support adding a dmpar_global_abort routine in the dmpar
> module, but I'd also suggest considering whether the error being
> checked for is one that can be recovered from, in which case a
> return error code might be the cleanest approach in that
> particular case.
>
> Cheers,
> Michael
>
>
> On Mon, Apr 26, 2010 at 02:10:10PM -0600, Xylar Asay-Davis wrote:
>> I'm trying to use dmpar_abort as a way to stop the code with an error
>> message when things go wrong with the code I'm testing.  I could just
>> use stop, but I figured dmpar_abort was the "proper" way.  The  
>> problem
>> is that dminfo, the argument needed by dmpar_abort, is a member of  
>> the
>> domain, which is not available in many subroutines.  And it's
>> inconvenient to have to pass around any extra arguments to my
>> subroutines just in case I might want to abort.
>>
>> Any suggestions?
>>
>> -Xylar
>>
>> -- 
>>
>> ***********************
>> Xylar S. Asay-Davis
>> E-mail: xylar at lanl.gov
>> Phone: (505) 606-0025
>> Fax: (505) 665-2659
>> CNLS, MS B258
>> Los Alamos National Laboratory
>> Los Alamos, NM 87545
>> ***********************
>>
>>
>> _______________________________________________
>> mpas-developers mailing list
>> mpas-developers at mailman.ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/mpas-developers
> _______________________________________________
> mpas-developers mailing list
> mpas-developers at mailman.ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/mpas-developers

---
Correspondence/TSPA/DUSA AOE
------------------------------------------------------------
Philip Jones                                pwjones at lanl.gov
T-3 MS B216                                 Ph: 505-667-6387
Los Alamos National Lab                    Fax: 505-665-5926
Los Alamos, NM 87545-1663





More information about the mpas-developers mailing list