[Wrf-users] real.exe failing on huge domains

Don Morton morton at arsc.edu
Thu Sep 3 13:53:32 MDT 2009


Howdy,

Just an update - after a lot of work, I got WRF to compile with the  
Pathscale compilers, and am experiencing the same problem described  
below.  With a "huge" domain (the threshold is somewhere between  
3038x3038 and 5000x5000 horizontal points, with 28 levels), real.exe  
fails with the following.

> -------------- FATAL CALLED ---------------
> FATAL CALLED FROM FILE:  module_initialize_real.b  LINE:     526
> p_top_requested < grid%p_top possible from data

So, I believe at this point I've made a reasonable case that this is  
not an issue with a specific architecture, or solely with the PGI  
compilers.

I believe it may be time to go in and operate on the real.exe code!

By the way, one person asked me to run this in serial and provide  
output, but this problem is much, much too big for serial execution!




On Aug 31, 2009, at 2:41 PM, Don Morton wrote:

> First - the basic question - has anybody been successful in WPS'ing
> and real.exe'ing a large domain, on the order of 6075x6075x27 grid
> points (approximately 1 billion)?
>
> I've almost convinced (I say "almost" because I recognize that I, like
> others, am capable of making stupid mistakes) myself that there is an
> issue with real.exe which, for large grids, results in an error
> message of the form:
>
> =====================
> p_top_requested =     5000.000
> allowable grid%p_top in data   =     55000.00
> -------------- FATAL CALLED ---------------
> FATAL CALLED FROM FILE:  module_initialize_real.b  LINE:     526
> p_top_requested < grid%p_top possible from data
> =====================
>
> and I'm beginning to think that this is somehow related to memory
> allocation issues.  I'm currently working on a 1km resolution case,
> centered on Fairbanks, Alaska.  If I use a 3038x3038 horizontal grid,
> it all works fine, but with a 6075x6075 grid, I get the above error.
> In both cases, I've written an NCL script to print the min/max/avg
> values of the PRES field in met_em*, and at the top level they both
> come out to 1000 Pa and at the next level down they both come out to
> 2000 Pa.  So, I'm guessing my topmost pressure fields are fine.  So,
> I'm guessing that the met_em file being fed to real.exe is good.
>
> Further information:
>
> - I've tried these cases under a number of varying conditions -
> different resolutions, different machines (a Sun Opteron cluster and a
> Cray XT5).  In all cases, however, I've been using the PGI compilers
> (but I may try Pathscale on one of the machines to see if that makes a
> difference).  I feel pretty good about having ruled out resolution,
> physics, etc. as a problem, and feel like I've narrowed this down to
> be a problem that's a function of domain size.
>
> - With some guidance from John Michalakes and folks at Cray, I feel
> pretty certain that I'm not running out of memory on the compute
> nodes, though I'll be probing this a little more.  In one case (that
> failed with the above problem) I put MPI Task 0 on a 32 GByte node all
> by itself, then partitioned the other 255 tasks, 8 to an 8-core node
> (two quad-core processors) each with 32 GBytes memory (4 GBytes per
> task).
>
> - Have tried this with WRFV3.0.1.1 and WRFV3.1
>
>
> I'll continue to probe, and may need to start digging into the
> real.exe source, but just wanted to know if anybody else has
> experienced success or failure with this size of a problem.  I'm aware
> that a Gordon Bell entry last year was performed with about 2 billion
> grid points, but I think I remember someone telling me that the run
> wasn't prepared with WPS.
>
> Thanks,
>
> Don Morton
> -- 
> Arctic Region Supercomputing Center
> http://www.arsc.edu/~morton/
> _______________________________________________
> Wrf-users mailing list
> Wrf-users at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/wrf-users

---
Arctic Region Supercomputing Center
http://www.arsc.edu/~morton/







More information about the Wrf-users mailing list