[cam-users] CGD Forums
phpbb at cgd.ucar.edu
phpbb at cgd.ucar.edu
Fri Jan 14 09:00:00 MST 2005
Dear cam-users,
As you requested, here is the latest digest of messages posted on CGD Forums forums. Please come and join the discussion!
<< Why is CAM slower in CCSM3.0 than in CCSM2.0.1?, http://bb.cgd.ucar.edu//viewtopic.php?t=92 >>
PosterguscorreaPostedFri Jan 14, 2005 3:59 pm MST, http://bb.cgd.ucar.edu//viewtopic.php?p=272#272
Message: Hello CAM maintainers and user list
<br />
<br />I posted this on the CCSM list, but maybe it is appropriate to post it here also,
<br />since the questions are related to CAM's performance.
<br />
<br />I installed CCSM2.0.1 and CCSM3.0 on our beowulf cluster at LDEO.
<br />Somehow the new version runs about 47% slower than the old one.
<br />A 1-year t42_gx1v3 full dynamic run, using 32 cpus,
<br />takes about 21h 13min on CCSM2.0.1, but lingers for 31h 15min on CCSM3.0,
<br />using the same cpu/component distribution.
<br />
<br />FYI, our cluster nodes are Dual 1.2GHz Athlon, 1GB ram, with Myrinet 2000,
<br />Linux kernel 2.4.18, MPICH 1.2, Gnu and PGI 5.2-4 compilers, and PBS.
<br />
<br />The timing data provided by the coupler suggest that the atmosphere component
<br />"cam" is much slower in CCSM3.0 than it was in CCSM2.0.1.
<br />Here is a comparison of timers "t25" on CCSM3.0 and "t26" on CCSM2.0.1,
<br />which correspond to the atm -> cpl commu!
nication:
<br />
<br />CCSM3.0 cpl log file : (shr_timer_print) timer 27: 8760 calls, 70403.943s, id: t25
<br />CCSM2.0.1 cpl log file: (shr_timer_print) timer 27: 8760 calls, 47996.592s, id: t26
<br />
<br />Cam's MPI communication was significantly modified in CCSM3.0.
<br />Would this be the reason for the drop in speed?
<br />
<br />Cam's timing files suggest that this is the case.
<br />They show that, compared to CCSM2.0.1:
<br />
<br />1. The total time "cam" takes between send/recv to/from the coupler increased by about 67%;
<br />
<br />---MODEL (cam component) ----- cam timer ----- No. of calls - wall time (s)
<br />from CCSM3.0 cam's timing.0: ccsm_rcvtosnd --- 8761 ------ 69598.148
<br />from CCSM3.0 cam's timing.0: ccsm_sndtorcv --- 8760 ------ 28763.375
<br />--------------- CCSM3.0 cam total communication time w/ cpl: 98361.523
<br />
<br />--MODEL (cam component) ----- cam timer ----- No. of calls - wall time (s)
<br />from CCSM2.0.1 cam's!
timing.0: ccsm_rcvtosnd --- 8761 ------ 46653.520
<br />from !
CCSM2.0.
1 cam's timing.0: ccsm_sndtorcv --- 8760 ------ 12143.051
<br />-------------- CCSM2.0.1 cam total communication time w/ cpl: 58769.571
<br />
<br />2. The total time spent on all MPI routines (i.e. communication time)
<br />increased by about 33%.
<br />MODEL ---- Total wall time spent on MPI calls
<br />CCSM3.0 : 18500 s
<br />CCSM2.0.1: 13894 s
<br />
<br />Most of the difference appears to be
<br />due to the replacement of "mpi_sendrecv" by "mpi_alltoallv":
<br />
<br />
<br />---MODEL (cam component) ---- MPI function -- No. of calls -- wall time (s)
<br />from CCSM3.0 cam's timing.0: mpi_alltoallv -- 52562 ------- 16671.400
<br />from CCSM2.0.1 cam's timing.0: mpi_sendrecv --762120 ------- 4625.800
<br />
<br />3. Overall some of cam's most computationally intensive routines became significatnly slower:
<br />
<br />(Wall time in seconds)
<br />MODEL/ROUTINE phys_driver --- radctl --- dynpkg --- realloc4(a)
<br />CCSM3.0 ----------------- 72292 ------ !
35701 --- 38137 ----- 16870
<br />CSSM2.0.1 --------------- 54795 ------ 19286 --- 20077 ----- 1827
<br />
<br />___________________
<br />
<br />Questions:
<br />
<br />A) Is there a simple way in to improve the performance of cam in CCSM3.0?
<br />
<br />B) Was there a significant increase in the calculations performed by cam's physics/dynamics algorithms,
<br />which might justify the 47% increase in wall time?
<br />Or is the MPI framework of the new "cam" in CCSM3.0 tuned to NCAR's shared memory machines,
<br />but not optimized to distributed memory beowulf clusters?
<br />
<br />( I tried increasing cam's cpus from 6 to 8, while decreasing the land (clm) cpus from 4 to 2.
<br />However, CCSM3.0 (walltime 26h30min) is still 32% slower than CCSM2.0.1 (walltime 20h07min).
<br />I guess the problem is beyond load balance, and CCSM3 is in fact slower than its predecessor. )
<br />
<br />C) Is there a simple option (namelist option, macro definition for compilation, or !
other)
<br />that would restore the style of cam's MPI communi!
cation t
o what it was in CCSM2.0.1,
<br />which seems to be more efficient on beowulf clusters?
<br />
<br />Thank you very much.
<br />
<br />Gus Correa
------------------------------
Format TEXT
Show Message Text YES
Show My Messages YES
Digest Frequency DAY
Show only new messages since last time I logged in FALSE
Send empty digests NO
Time of day to send digest 9 AM
Maximum characters per message in digest 32000
More information about the cam-users
mailing list