[mpas-developers] mpas_timer.F synchronization issue.
Doug Jacobsen
jacobsen.douglas at gmail.com
Sat Dec 10 15:27:27 MST 2011
Hi All,
In order to provide even more information, in the timers I added the load
efficiency to the output. So the first time column is now a global max of
time spent in a routine. I added a new column at the end that is (global
ave / global max) and presents the efficiency of the timed code. If it's
1.0 that means the load balancing is good, and if it's 0.0 that means it's
bad. I have attached a sample output from the timers, which was run on 32
processors as a test, and the newer version of the timers that I am
proposing to have committed to the trunk.
Thanks,
Doug
On Fri, Dec 9, 2011 at 9:08 AM, Doug Jacobsen <jacobsen.douglas at gmail.com>wrote:
> Phil,
>
> Yeah, I figured that max would give you the most relevant information to
> see where large time is spent on any processor. The min wouldn't be too
> difficult to print as well. I figured average would either be fairly
> representative of the total set of processors, or it would get corrupted by
> load imbalances if they were really bad.
>
> The global reductions are done at the end of the simulation, and they are
> only called if the domain_info variable points to domain % dminfo. Which
> means that the mpas_core line I previously mentioned needs to be added to
> the initialization phase.
>
> In terms of min and max, the global min and max times for a single call
> are still printed. And the max total time spent in a routine (for a
> processor) is printed. I could determine the min total time spent in a
> routine (for a processor) as well if that's something everyone is
> interested in. As a summary of the current information that is printed.
> Below is a line from the current timer output.
>
> 1 initialize 6.56141 1
> 6.56141 6.56141 6.56141 0.01
>
> The first column tells you how many timers were started prior to starting
> this one. The second tells you the timer's name. Third is the total time
> spent within this timer, including all starts and stops. Fourth is the
> number of times this timer was started and stopped. Fourth is the min time
> for a single call to this timer. Fifth is the max time for a single call to
> this timer. Sixth is the average time for a single call to this timer (or
> column 3 / colum 4) and finally is the percent of the total_time timer that
> was spent in this timer.
>
> The global reduction takes the global max of columns 3 and 5, and the
> global min of colum 6. And columns 7 and 8 are recomputed using the new
> global max of column 3.
>
> Let me know if there is any more information anyone wants in the timers.
>
> Doug
>
> On Fri, Dec 9, 2011 at 8:35 AM, Jones, Philip W <pwjones at lanl.gov> wrote:
>
>>
>> Doug,
>>
>> Max time is most relevant and printing the min time is useful to get an
>> idea of load imbalance. Don't think the mean really tells you much.
>>
>> And the global reductions are only in the timer print, right?
>>
>> Thanks,
>>
>> Phil
>>
>> TSPA/Correspondence/DUSA EARTH
>> ------
>> Philip Jones (pwjones at lanl.gov)
>> Climate, Ocean and Sea Ice Modeling
>> Los Alamos National Laboratory
>> T-3 MS B216
>> P.O. Box 1663
>> Los Alamos, NM 87545
>> ------------------------------
>> *From:* mpas-developers-bounces at mailman.ucar.edu [
>> mpas-developers-bounces at mailman.ucar.edu] on behalf of Doug Jacobsen [
>> jacobsen.douglas at gmail.com]
>> *Sent:* Thursday, December 08, 2011 7:36 PM
>> *To:* mpas-developers at ucar.edu
>> *Subject:* Re: [mpas-developers] mpas_timer.F synchronization issue.
>>
>> Hi Everyone,
>>
>> Something else that I would like input on regarding this. I currently
>> have two options for synchronizing the timers. First is the current
>> version, which just uses the max of all of the processors timers. The other
>> option would be to average all of the timers across processors. Each have
>> their own benefits and provide slightly different information. So if anyone
>> has any preferences it would be good to have a discussion about them.
>>
>> Thanks!
>> Doug
>>
>> On Thu, Dec 8, 2011 at 3:47 PM, Doug Jacobsen <jacobsen.douglas at gmail.com
>> > wrote:
>>
>>> Hello Everyone,
>>>
>>> I recently noticed that when running an MPI job processors would report
>>> different times for sub-timers, ie. not including the total_time timer.
>>> This is mostly due to some processors having to wait for mpi calls to
>>> finish while other ones don't. None of the previous versions of
>>> mpas_timer.F have supported making sure the timers report the same time
>>> over all of the processors. So I have attached a new version of
>>> mpas_timer.F that supports this. It essentially makes each timer's total
>>> time the maximum total time over all of the processors. It also gets the
>>> global max and min single call time to print as well. I think this gives a
>>> better over all profile for the time spent in routines rather than having
>>> to go through each processors log.*.out file to see how it behaved.
>>>
>>> To support this, the timer module now stores a pointer to domain %
>>> dminfo so you don't have to pass it in to print out the timers. Doing this
>>> allows the current timer implementation to stay the same, and allows the
>>> syncing of timers by adding a single line to mpas_*_mpas_core.F within each
>>> core, which is:
>>>
>>> call mpas_timer_init(domain)
>>>
>>> within mpas_core_init.
>>>
>>> I'm open to any comments or suggestions regarding this change, but I
>>> would like to propagate it to the trunk. I will also propagate the above
>>> addition to mpas_ocn_mpas_core.F but can add it to the other cores if
>>> requested.
>>>
>>> Thanks for your input.
>>> Doug
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/mpas-developers/attachments/20111210/b0e2721c/attachment-0001.html
-------------- next part --------------
Using z-level coordinates
timer_name total calls min max avg percent efficiency
0 total time 454.93673 1 454.79560 454.93673 454.93673
1 initialize 5.82919 1 5.71109 5.82919 5.82919 0.01 0.99
2 compute_solve_diagnostics 90.65039 2077 0.01990 0.09795 0.00002 0.20 0.88
3 compute_solve_diagnostics-hEdge 2.55096 2077 0.00043 0.00486 0.00000 0.01 0.79
4 compute_solve_diagnostics-hEdge 2.54122 2077 0.00042 0.00486 0.00000 0.01 0.79
3 equation_of_state 13.09548 5192 0.00081 0.00656 0.00000 0.03 0.84
4 equation_of_state_jm 13.06267 5192 0.00080 0.00655 0.00000 0.03 0.84
3 wTop 5.86591 2077 0.00152 0.00577 0.00000 0.01 0.86
1 time integration 448.21937 519 0.79084 1.07320 0.00166 0.99 1.00
2 RK4-main loop 338.77847 519 0.56020 0.83179 0.00126 0.74 0.91
3 RK4-diagnostic halo update 96.94234 2076 0.00207 0.15908 0.00002 0.21 0.36
3 RK4-tendency computations 173.53435 2076 0.03780 0.11710 0.00004 0.38 0.86
4 compute_tend_h 3.43351 2076 0.00060 0.00428 0.00000 0.01 0.61
5 compute_tend_h-horiz adv 2.80887 2076 0.00043 0.00401 0.00000 0.01 0.55
5 compute_tend_h-vert adv 0.16503 2076 0.00003 0.00042 0.00000 0.00 0.63
4 compute_tend_u 80.70319 2076 0.01769 0.06730 0.00002 0.18 0.86
5 compute_tend_u-coriolis 32.85087 2076 0.00616 0.02676 0.00001 0.07 0.79
5 compute_tend_u-vert adv 4.82752 2076 0.00123 0.00521 0.00000 0.01 0.91
5 compute_tend_u-pressure grad 2.85730 2076 0.00066 0.00438 0.00000 0.01 0.88
5 compute_tend_u-horiz mix 38.31071 2076 0.00808 0.03199 0.00001 0.08 0.87
6 compute_tend_u-horiz mix-del2 5.46079 2076 0.00114 0.00552 0.00000 0.01 0.85
6 compute_tend_u-horiz mix-del4 33.12631 2076 0.00685 0.02934 0.00001 0.07 0.86
5 compute_tend_u-forcings 2.33122 2076 0.00040 0.00377 0.00000 0.01 0.61
4 compute_scalar_tend 89.02905 2076 0.01773 0.05285 0.00002 0.20 0.85
5 compute_scalar_tend-horiz adv 24.02053 2076 0.00434 0.02840 0.00001 0.05 0.84
6 compute_scalar_tend-horiz adv 23.99440 2076 0.00432 0.02839 0.00001 0.05 0.84
5 compute_scalar_tend-vert adv 12.47948 2076 0.00229 0.00788 0.00000 0.03 0.84
6 compute_scalar_tend-vert adv 4.06053 2076 0.00057 0.00456 0.00000 0.01 0.81
5 compute_scalar_tend-horiz diff 51.16401 2076 0.01013 0.03280 0.00001 0.11 0.83
6 compute_scalar_tend-horiz dif 51.13484 2076 0.01011 0.03279 0.00001 0.11 0.83
5 compute_scalar_tend-restoring 0.32410 2076 0.00008 0.00034 0.00000 0.00 0.71
4 enforce_boundaryEdge 2.95668 2076 0.00085 0.00425 0.00000 0.01 0.74
3 RK4-pronostic halo update 74.92456 2076 0.00322 0.15148 0.00002 0.16 0.40
3 RK4-update diagnostic variables 86.60230 2076 0.00001 0.08941 0.00002 0.19 0.87
3 RK4-RK4 accumulate update 23.28396 2076 0.00556 0.01527 0.00001 0.05 0.78
2 RK4-cleaup phase 103.28245 519 0.11111 0.27786 0.00038 0.23 0.89
3 RK4-implicit vert mix 76.76441 519 0.08610 0.16828 0.00028 0.17 0.90
4 compute_vertical_mix_coefficien 12.02765 519 0.01038 0.02630 0.00004 0.03 0.87
4 tridiagonal_solve 25.51430 1711143 0.00001 0.02489 0.00000 0.06 0.88
4 tridiagonal_solve_mult 8.16510 593736 0.00001 0.00225 0.00000 0.02 0.87
2 global diagnostics 5.29737 51 0.02467 0.18143 0.00204 0.01 0.51
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpas_timer.F
Type: text/x-fortran
Size: 11898 bytes
Desc: not available
Url : http://mailman.ucar.edu/pipermail/mpas-developers/attachments/20111210/b0e2721c/attachment-0001.bin
More information about the mpas-developers
mailing list