[Wrf-users] seg fault WRFv3.5 on 2096 processes

preeti malakar malakar.preeti at gmail.com
Mon Aug 11 21:54:15 MDT 2014


Hi,

 I noticed that there is segmentation fault when using fewer processes in
WRFv3.5 (compiled in dm+sm mode on BG/Q using XL compilers). I get seg
fault on 128 BG/Q nodes with 16 processes per node and 2 openmp threads,
however the run completes on 256 nodes with 16 processes per node and 2
openmp threads.

I did a little bit of debugging, and found that the segmentation fault on
fewer cores is due to uninitialized variables. This is happening during
radiation function call, which is typically every 30 min (radt = 30 in
namelist). In the subroutine cldprmc (in phys/module_ra_rrtmg_lw.F), few
processes seg fault in the line where abscoice(ig) is calculated. This is
due to uninitialized variables on the RHS (debugging shows NANs for some of
the variables). Digging up further, looks like cldprmc is called from
subroutine rrtmg_lw and before that, subroutine inatm is called where
variable reicmc is supposed to be initialized. But this initialization does
not happen correctly for all the processes. I do not yet have a clear idea
why this is the case because I do not know the code that well. I am
using ra_lw_physics=4. Can anyone help?

 Thanks,
Preeti
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20140811/13a6a2b1/attachment.html 


More information about the Wrf-users mailing list