[mpas-developers] 1/10 degree problems

Michael Duda duda at ucar.edu
Fri Apr 16 15:13:58 MDT 2010


Hi, Mat.

I'd agree with Xylar's assessment that the F is probably not an
indication of anything wrong -- rather, just indicating that none
of the timers were active at the point where their times were
printed. 

I wonder whether there could be some problem with the mesh
decomposition file, graph.info.part.64, that is causing cells to
not be assigned to any MPI task? Have you checked whether the
fields in the output.nc file are garbage or not -- or perhaps
whether all time periods look identical to the initial state? 

Michael


On Fri, Apr 16, 2010 at 02:27:01PM -0600, Xylar Asay-Davis wrote:
> Mat,
> 
> Could you send the namelist.input file you're using, too?  Who knows, 
> maybe something useful there?
> 
> I don't think the F is an indication of the problem.  If I'm reading the 
> code correctly, it just indicates that the timer (not the code) is no 
> longer running.  If you did a call to the code that prints the timing 
> information before calling timer_stop(), then this flag would be T instead.
> 
> 
> -Xylar
> 
> On 4/16/10 2:16 PM, Mathew Maltrud wrote:
> > Hi Michael and Todd--
> >
> > i've been trying to run the 1/10 dipole POP grid in the sw
> > configuration and am getting something i haven't seen before.  all
> > appears normal--all mpi process are going, etc.  the *.err files say
> > it is looping over timesteps, though clearly nothing is being done
> > (happening too fast).  there's no output.nc file.  here are examples
> > of the log.0000.* files (running on 64 cores):
> >
> > mm at cy-2.lanl.gov {10}% tail log.0000.err
> >    Doing timestep           11
> >    Doing timestep           12
> >    Doing timestep           13
> >    Doing timestep           14
> >    Doing timestep           15
> >    Doing timestep           16
> >    Doing timestep           17
> >    Doing timestep           18
> >    Doing timestep           19
> >    Doing timestep           20
> > mm at cy-2.lanl.gov {11}% tail log.0000.out
> >
> >     TIMINGS (process:event,running,cpu,wall,100*(wall/total wall))
> >        0 : total time          F        0.00000      196.55210
> >
> >        0 : initialize          F        0.00000       67.82460   34.51
> >        0 : time integration    F        0.00000       11.05870    5.63
> >
> > so the 'F' is a clue, but i don't know what it means.  note that the
> > grid.nc file looks ok, and i successfully ran the 4/10 version of this
> > grid earlier this week.
> >
> > any hints?  maybe not enough memory?  there are about 6 million cells...
> >
> > thanks...
> > -mat
> > _______________________________________________
> > mpas-developers mailing list
> > mpas-developers at mailman.ucar.edu
> > http://mailman.ucar.edu/mailman/listinfo/mpas-developers
> >    
> 
> 
> -- 
> 
> ***********************
> Xylar S. Asay-Davis
> E-mail: xylar at lanl.gov
> Phone: (505) 606-0025
> Fax: (505) 665-2659
> CNLS, MS B258
> Los Alamos National Laboratory
> Los Alamos, NM 87545
> ***********************
> 
> 
> _______________________________________________
> mpas-developers mailing list
> mpas-developers at mailman.ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/mpas-developers


More information about the mpas-developers mailing list