[mpas-developers] mpas_timer.F synchronization issue.

Doug Jacobsen jacobsen.douglas at gmail.com
Sat Dec 10 15:27:27 MST 2011


Hi All,

In order to provide even more information, in the timers I added the load
efficiency to the output. So the first time column is now a global max of
time spent in a routine. I added a new column at the end that is (global
ave / global max) and presents the efficiency of the timed code. If it's
1.0 that means the load balancing is good, and if it's 0.0 that means it's
bad. I have attached a sample output from the timers, which was run on 32
processors as a test, and the newer version of the timers that I am
proposing to have committed to the trunk.

Thanks,
Doug

On Fri, Dec 9, 2011 at 9:08 AM, Doug Jacobsen <jacobsen.douglas at gmail.com>wrote:

> Phil,
>
> Yeah, I figured that max would give you the most relevant information to
> see where large time is spent on any processor. The min wouldn't be too
> difficult to print as well. I figured average would either be fairly
> representative of the total set of processors, or it would get corrupted by
> load imbalances if they were really bad.
>
> The global reductions are done at the end of the simulation, and they are
> only called if the domain_info variable points to domain % dminfo. Which
> means that the mpas_core line I previously mentioned needs to be added to
> the initialization phase.
>
> In terms of min and max, the global min and max times for a single call
> are still printed. And the max total time spent in a routine (for a
> processor) is printed. I could determine the min total time spent in a
> routine (for a processor) as well if that's something everyone is
> interested in. As a summary of the current information that is printed.
> Below is a line from the current timer output.
>
>  1  initialize                                6.56141         1
> 6.56141        6.56141        6.56141    0.01
>
> The first column tells you how many timers were started prior to starting
> this one. The second tells you the timer's name. Third is the total time
> spent within this timer, including all starts and stops. Fourth is the
> number of times this timer was started and stopped. Fourth is the min time
> for a single call to this timer. Fifth is the max time for a single call to
> this timer. Sixth is the average time for a single call to this timer (or
> column 3 / colum 4) and finally is the percent of the total_time timer that
> was spent in this timer.
>
> The global reduction takes the global max of columns 3 and 5, and the
> global min of colum 6. And columns 7 and 8 are recomputed using the new
> global max of column 3.
>
> Let me know if there is any more information anyone wants in the timers.
>
> Doug
>
> On Fri, Dec 9, 2011 at 8:35 AM, Jones, Philip W <pwjones at lanl.gov> wrote:
>
>>
>>  Doug,
>>
>>  Max time is most relevant and printing the min time is useful to get an
>> idea of load imbalance.  Don't think the mean really tells you much.
>>
>>  And the global reductions are only in the timer print, right?
>>
>>  Thanks,
>>
>> Phil
>>
>> TSPA/Correspondence/DUSA EARTH
>> ------
>> Philip Jones (pwjones at lanl.gov)
>> Climate, Ocean and Sea Ice Modeling
>> Los Alamos National Laboratory
>> T-3 MS B216
>> P.O. Box 1663
>> Los Alamos, NM 87545
>>   ------------------------------
>> *From:* mpas-developers-bounces at mailman.ucar.edu [
>> mpas-developers-bounces at mailman.ucar.edu] on behalf of Doug Jacobsen [
>> jacobsen.douglas at gmail.com]
>> *Sent:* Thursday, December 08, 2011 7:36 PM
>> *To:* mpas-developers at ucar.edu
>> *Subject:* Re: [mpas-developers] mpas_timer.F synchronization issue.
>>
>>  Hi Everyone,
>>
>> Something else that I would like input on regarding this. I currently
>> have two options for synchronizing the timers. First is the current
>> version, which just uses the max of all of the processors timers. The other
>> option would be to average all of the timers across processors. Each have
>> their own benefits and provide slightly different information. So if anyone
>> has any preferences it would be good to have a discussion about them.
>>
>> Thanks!
>> Doug
>>
>> On Thu, Dec 8, 2011 at 3:47 PM, Doug Jacobsen <jacobsen.douglas at gmail.com
>> > wrote:
>>
>>> Hello Everyone,
>>>
>>> I recently noticed that when running an MPI job processors would report
>>> different times for sub-timers, ie. not including the total_time timer.
>>> This is mostly due to some processors having to wait for mpi calls to
>>> finish while other ones don't. None of the previous versions of
>>> mpas_timer.F have supported making sure the timers report the same time
>>> over all of the processors. So I have attached a new version of
>>> mpas_timer.F that supports this. It essentially makes each timer's total
>>> time the maximum total time over all of the processors. It also gets the
>>> global max and min single call time to print as well. I think this gives a
>>> better over all profile for the time spent in routines rather than having
>>> to go through each processors log.*.out file to see how it behaved.
>>>
>>> To support this, the timer module now stores a pointer to domain %
>>> dminfo so you don't have to pass it in to print out the timers. Doing this
>>> allows the current timer implementation to stay the same, and allows the
>>> syncing of timers by adding a single line to mpas_*_mpas_core.F within each
>>> core, which is:
>>>
>>> call mpas_timer_init(domain)
>>>
>>> within mpas_core_init.
>>>
>>> I'm open to any comments or suggestions regarding this change, but I
>>> would like to propagate it to the trunk. I will also propagate the above
>>> addition to mpas_ocn_mpas_core.F but can add it to the other cores if
>>> requested.
>>>
>>> Thanks for your input.
>>> Doug
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/mpas-developers/attachments/20111210/b0e2721c/attachment-0001.html 
-------------- next part --------------
  Using z-level coordinates
   timer_name                                  total     calls          min            max            avg     percent  efficiency
 0 total time                               454.93673         1      454.79560      454.93673      454.93673
 1  initialize                                5.82919         1        5.71109        5.82919        5.82919    0.01    0.99
 2   compute_solve_diagnostics               90.65039      2077        0.01990        0.09795        0.00002    0.20    0.88
 3    compute_solve_diagnostics-hEdge         2.55096      2077        0.00043        0.00486        0.00000    0.01    0.79
 4     compute_solve_diagnostics-hEdge        2.54122      2077        0.00042        0.00486        0.00000    0.01    0.79
 3    equation_of_state                      13.09548      5192        0.00081        0.00656        0.00000    0.03    0.84
 4     equation_of_state_jm                  13.06267      5192        0.00080        0.00655        0.00000    0.03    0.84
 3    wTop                                    5.86591      2077        0.00152        0.00577        0.00000    0.01    0.86
 1  time integration                        448.21937       519        0.79084        1.07320        0.00166    0.99    1.00
 2   RK4-main loop                          338.77847       519        0.56020        0.83179        0.00126    0.74    0.91
 3    RK4-diagnostic halo update             96.94234      2076        0.00207        0.15908        0.00002    0.21    0.36
 3    RK4-tendency computations             173.53435      2076        0.03780        0.11710        0.00004    0.38    0.86
 4     compute_tend_h                         3.43351      2076        0.00060        0.00428        0.00000    0.01    0.61
 5      compute_tend_h-horiz adv              2.80887      2076        0.00043        0.00401        0.00000    0.01    0.55
 5      compute_tend_h-vert adv               0.16503      2076        0.00003        0.00042        0.00000    0.00    0.63
 4     compute_tend_u                        80.70319      2076        0.01769        0.06730        0.00002    0.18    0.86
 5      compute_tend_u-coriolis              32.85087      2076        0.00616        0.02676        0.00001    0.07    0.79
 5      compute_tend_u-vert adv               4.82752      2076        0.00123        0.00521        0.00000    0.01    0.91
 5      compute_tend_u-pressure grad          2.85730      2076        0.00066        0.00438        0.00000    0.01    0.88
 5      compute_tend_u-horiz mix             38.31071      2076        0.00808        0.03199        0.00001    0.08    0.87
 6       compute_tend_u-horiz mix-del2        5.46079      2076        0.00114        0.00552        0.00000    0.01    0.85
 6       compute_tend_u-horiz mix-del4       33.12631      2076        0.00685        0.02934        0.00001    0.07    0.86
 5      compute_tend_u-forcings               2.33122      2076        0.00040        0.00377        0.00000    0.01    0.61
 4     compute_scalar_tend                   89.02905      2076        0.01773        0.05285        0.00002    0.20    0.85
 5      compute_scalar_tend-horiz adv        24.02053      2076        0.00434        0.02840        0.00001    0.05    0.84
 6       compute_scalar_tend-horiz adv       23.99440      2076        0.00432        0.02839        0.00001    0.05    0.84
 5      compute_scalar_tend-vert adv         12.47948      2076        0.00229        0.00788        0.00000    0.03    0.84
 6       compute_scalar_tend-vert adv         4.06053      2076        0.00057        0.00456        0.00000    0.01    0.81
 5      compute_scalar_tend-horiz diff       51.16401      2076        0.01013        0.03280        0.00001    0.11    0.83
 6       compute_scalar_tend-horiz dif       51.13484      2076        0.01011        0.03279        0.00001    0.11    0.83
 5      compute_scalar_tend-restoring         0.32410      2076        0.00008        0.00034        0.00000    0.00    0.71
 4     enforce_boundaryEdge                   2.95668      2076        0.00085        0.00425        0.00000    0.01    0.74
 3    RK4-pronostic halo update              74.92456      2076        0.00322        0.15148        0.00002    0.16    0.40
 3    RK4-update diagnostic variables        86.60230      2076        0.00001        0.08941        0.00002    0.19    0.87
 3    RK4-RK4 accumulate update              23.28396      2076        0.00556        0.01527        0.00001    0.05    0.78
 2   RK4-cleaup phase                       103.28245       519        0.11111        0.27786        0.00038    0.23    0.89
 3    RK4-implicit vert mix                  76.76441       519        0.08610        0.16828        0.00028    0.17    0.90
 4     compute_vertical_mix_coefficien       12.02765       519        0.01038        0.02630        0.00004    0.03    0.87
 4     tridiagonal_solve                     25.51430   1711143        0.00001        0.02489        0.00000    0.06    0.88
 4     tridiagonal_solve_mult                 8.16510    593736        0.00001        0.00225        0.00000    0.02    0.87
 2   global diagnostics                       5.29737        51        0.02467        0.18143        0.00204    0.01    0.51
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpas_timer.F
Type: text/x-fortran
Size: 11898 bytes
Desc: not available
Url : http://mailman.ucar.edu/pipermail/mpas-developers/attachments/20111210/b0e2721c/attachment-0001.bin 


More information about the mpas-developers mailing list