[Wrf-users] da_wrfvar.exe failing on "large" domains

Steven G Decker decker at envsci.rutgers.edu
Wed Sep 23 12:49:25 MDT 2009


Don,

Assuming the test case works fine, my wild guess is that somewhere in 
the code the wrong integer kind is being used (either a bug in the 
source code or a bug in the compiler or improper compiler flags), and 
the large domain size is leading to integer overflow.

What you are seeing is similar to the results of the following Fortran 
program:
program overflow
   implicit none

   integer, parameter :: Long  = selected_int_kind(8)
   integer, parameter :: LLong = selected_int_kind(16)

   integer(Long)  :: i
   integer(LLong) :: j

   i = 400000000
   print *, i
   j = transfer(20*i,j)
   print *, j
end program overflow

Change the kind of i to LLong and the "bug" goes away.

Try turning on all of the compiler flags involving debugging (bounds 
checks, type checking, interface checking, etc.) and cross your fingers. 
  If a Fortran 77-style implicit interface is involved, be prepared to 
pull out your hair.

The negative values for the "m" indices are fine as they allow for a 
halo of points around each core's portion of the domain.

Hope this helps,
Steve

> Date: Tue, 22 Sep 2009 14:19:15 -0800
> From: Don Morton <morton at arsc.edu>
> Subject: [Wrf-users] da_wrfvar.exe failing on "large" domains
> To: wrf-users at ucar.edu
> Message-ID:
> 	<237e74280909221519x1187c3f8oa198c2cb506fbfb2 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> I'm trying to run da_wrfvar.exe on a 1050x1050x75 grid point domain,
> at 3km resolution.  This strikes me as a "large" domain, but not
> really unreasonably large.  Inevitably, even with 256 cores, I get the
> following ABEND message:
> 
> taskid: 0 hostname: nid00318
>   Ntasks in X            16 , ntasks in Y            16
>   *************************************
>   Parent domain
>   ids,ide,jds,jde             1         1050            1         1050
>   ims,ime,jms,jme            -4           73           -4           73
>   ips,ipe,jps,jpe             1           66            1           66
>   *************************************
>  DYNAMICS OPTION: Eulerian Mass Coordinate
>     alloc_space_field: domain             1 ,     454678064  bytes allocated
>  WRF NUMBER OF TILES =   1
> 0: ALLOCATE: 18446744072020333888 bytes requested; not enough memory
> 
> 
> Although I can believe the figure for the alloc_space_field, I'm just
> a little suspicious of the number of bytes requested for Task 0 - if
> I've read this correctly, it comes out to 18.4 Exabytes! :)
> 
> Although I'm not sure, I believe the ims, ime,jms,jme values are the
> start/stop dimensions of a subdomain in a given task, and if that's
> the case, I'm suspicious about the negative start value.
> 
> I'll look into the code, but I'd like to first pose the question of
> whether anybody has used da_wrfvar.exe for domains this big and/or if
> anybody knows of inherent limitations that might prevent me from doing
> so.
> 
> Thanks,
> 
> Don Morton


-- 
Steve Decker, Assistant Professor
Department of Environmental Sciences    Phone: (732) 932-9800 x 6203
Rutgers University                               Fax: (732) 932-8644
14 College Farm Rd                  Email: decker at envsci.rutgers.edu
New Brunswick, NJ  08901-8551


More information about the Wrf-users mailing list