[mpas-developers] mpas_timer.F synchronization issue.
Jones, Philip W
pwjones at lanl.gov
Fri Dec 9 08:35:04 MST 2011
Max time is most relevant and printing the min time is useful to get an idea of load imbalance. Don't think the mean really tells you much.
And the global reductions are only in the timer print, right?
Philip Jones (pwjones at lanl.gov)
Climate, Ocean and Sea Ice Modeling
Los Alamos National Laboratory
T-3 MS B216
P.O. Box 1663
Los Alamos, NM 87545
From: mpas-developers-bounces at mailman.ucar.edu [mpas-developers-bounces at mailman.ucar.edu] on behalf of Doug Jacobsen [jacobsen.douglas at gmail.com]
Sent: Thursday, December 08, 2011 7:36 PM
To: mpas-developers at ucar.edu
Subject: Re: [mpas-developers] mpas_timer.F synchronization issue.
Something else that I would like input on regarding this. I currently have two options for synchronizing the timers. First is the current version, which just uses the max of all of the processors timers. The other option would be to average all of the timers across processors. Each have their own benefits and provide slightly different information. So if anyone has any preferences it would be good to have a discussion about them.
On Thu, Dec 8, 2011 at 3:47 PM, Doug Jacobsen <jacobsen.douglas at gmail.com<mailto:jacobsen.douglas at gmail.com>> wrote:
I recently noticed that when running an MPI job processors would report different times for sub-timers, ie. not including the total_time timer. This is mostly due to some processors having to wait for mpi calls to finish while other ones don't. None of the previous versions of mpas_timer.F have supported making sure the timers report the same time over all of the processors. So I have attached a new version of mpas_timer.F that supports this. It essentially makes each timer's total time the maximum total time over all of the processors. It also gets the global max and min single call time to print as well. I think this gives a better over all profile for the time spent in routines rather than having to go through each processors log.*.out file to see how it behaved.
To support this, the timer module now stores a pointer to domain % dminfo so you don't have to pass it in to print out the timers. Doing this allows the current timer implementation to stay the same, and allows the syncing of timers by adding a single line to mpas_*_mpas_core.F within each core, which is:
I'm open to any comments or suggestions regarding this change, but I would like to propagate it to the trunk. I will also propagate the above addition to mpas_ocn_mpas_core.F but can add it to the other cores if requested.
Thanks for your input.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpas-developers