[Wrf-users] wrf.exe errors with RSL_LITE

Jyothi N millyjy77 at yahoo.com
Sat Jul 21 08:19:14 MDT 2007


Dear WRF users, I need some input regarding some issues with running 'wrf.exe' on a HP XC opteron cluster with PGI and MPI compilers. 

I get a core dump when I configure WRF with option 3, that is for RSL_LITE -  'PC Linux x86_64 (IA64 and Opteron), PGI 5.2 or higher DM-Parallel   (RSL_
LITE, MPICH, Allows nesting, No periodic LBCs)'.

When I use the option 2 ,   'PC Linux x86_64 (IA64 and Opteron), PGI 5.2 or higher, DM-Parallel  (RSL,MPICH, Allows nesting)' - it  works fine, and the run completes successfully.

I am concerned as the online tutorial seems to advise using RSL_LITE whenever possible for nested runs, which is what I am looking for.

Can someone help me in resolving this issue with RSL_LITE, and also let me know, if it doesn't really matter, using RSL, in place of RSL_LITE. Will there be any issues later on, if I use multiple nesting, has someone faced something similar? And why is it that the RSL_LITE preferable to RSL, for nested runs?

==========================================================

The following are the specifications of the errors with option 3 - the RSL_LITE option:

The test case is 'em_real'. Everything seems to work fine upto 'run.exe'. The final executable run gives a core dump. The following are the files i get:

-rw-r--r--  1 jy cl  7076584 Jul 21 08:13 wrfinput_d01
-rw-r--r--  1 jy cl  10056744 Jul 21 08:13 wrfbdy_d01
-rw-r--r--  1 jy cl   7666844 Jul 21  2007 wrfout_d01_2005-08-28_00:00:00
-rw-------  1 jy cl 157868032 Jul 21  2007 core.30123
-rw-------  1 jy cl 140431360 Jul 21  2007 core.15189
-rw-------  1 jy cl 139616256 Jul 21  2007 core.14273

The output seems to stop writing to the 'wrfout*' file after a while, and the program crashes, as indicated by the size of the file. I get the following error:

 starting wrf task             0  of             4
 starting wrf task             3  of             4
 starting wrf task             2  of             4
 starting wrf task             1  of             4
srun: error: n51: task2: Segmentation fault (core dumped)
srun: Terminating job
srun: error: n49: task0: Exited with exit code 1

My stack size is unlimited. I've set the following env var:
export MP_STACK_SIZE=64000000, as well.

I am not expecting that there should be any CFL violations for this basic run, as the specifications are exactly as given in the online tutorial. And also since, it has worked fine for the option 2.

===========================================

The following is the 'tail -f' output of the first 'rsl.out.0000' file. The other three 'rsl.out' files are attached in entirety below:
-------------------------------------------------

'rsl.out.0000':

------------------
 REAL_DATA_INIT_TYPE =            1
 /
 &GRIB2
 BACKGROUND_PROC_ID =          255,
 FORECAST_PROC_ID =          255,
 PRODUCTION_STATUS =          255,
 COMPRESSION =           40
 /

 Ntasks in X             2 , ntasks in Y             2
 WRF V2.2 MODEL
  *************************************
  Parent domain
  ids,ide,jds,jde             1           75            1           70
  ims,ime,jms,jme            -4           43           -4           41
  ips,ipe,jps,jpe             1           37            1           35
  *************************************
 DYNAMICS OPTION: Eulerian Mass Coordinate
   med_initialdata_input: calling input_model_input
 INPUT LANDUSE = USGS
 LANDUSE TYPE = USGS FOUND           33  CATEGORIES            2  SEASONS
  WATER CATEGORY =            16  SNOW CATEGORY =            24
  STEPRA,STEPCU,STEPBL           10            2            1
Timing for Writing wrfout_d01_2005-08-28_00:00:00 for domain        1:    0.24800 elapsed seconds.
Timing for processing lateral boundary for domain        1:    0.07000 elapsed seconds.
 WRF NUMBER OF TILES =   1
Timing for main: time 2005-08-28_00:03:00 on domain   1:    1.18500 elapsed seconds.
==================================================
The entire output from 'rsl.out.0001':

taskid: 1 hostname: n50
 Quilting with   1 groups of   0 I/O tasks.
  Ntasks in X             2 , ntasks in Y             2
 WRF V2.2 MODEL
  *************************************
  Parent domain
  ids,ide,jds,jde             1           75            1           70
  ims,ime,jms,jme            32           80           -4           41
  ips,ipe,jps,jpe            38           75            1           35
  *************************************
 DYNAMICS OPTION: Eulerian Mass Coordinate
   med_initialdata_input: calling input_model_input
 INPUT LANDUSE = USGS
 LANDUSE TYPE = USGS FOUND           33  CATEGORIES            2  SEASONS
  WATER CATEGORY =            16  SNOW CATEGORY =            24
  STEPRA,STEPCU,STEPBL           10            2            1
 WRF NUMBER OF TILES =   1
====================================================
The entire output from 'rsl.out.0002':

taskid: 2 hostname: n51
 Quilting with   1 groups of   0 I/O tasks.
  Ntasks in X             2 , ntasks in Y             2
 WRF V2.2 MODEL
  *************************************
  Parent domain
  ids,ide,jds,jde             1           75            1           70
  ims,ime,jms,jme            -4           43           30           75
  ips,ipe,jps,jpe             1           37           36           70
  *************************************
 DYNAMICS OPTION: Eulerian Mass Coordinate
   med_initialdata_input: calling input_model_input
 INPUT LANDUSE = USGS
 LANDUSE TYPE = USGS FOUND           33  CATEGORIES            2  SEASONS
  WATER CATEGORY =            16  SNOW CATEGORY =            24
  STEPRA,STEPCU,STEPBL           10            2            1
 WRF NUMBER OF TILES =   1
====================================================
The entire output from 'rsl.out.0003':

taskid: 3 hostname: n53
 Quilting with   1 groups of   0 I/O tasks.
  Ntasks in X             2 , ntasks in Y             2
 WRF V2.2 MODEL
  *************************************
  Parent domain
  ids,ide,jds,jde             1           75            1           70
  ims,ime,jms,jme            32           80           30           75
  ips,ipe,jps,jpe            38           75           36           70
  *************************************
 DYNAMICS OPTION: Eulerian Mass Coordinate
   med_initialdata_input: calling input_model_input
 INPUT LANDUSE = USGS
 LANDUSE TYPE = USGS FOUND           33  CATEGORIES            2  SEASONS
  WATER CATEGORY =            16  SNOW CATEGORY =            24
  STEPRA,STEPCU,STEPBL           10            2            1
 WRF NUMBER OF TILES =   1
======================================================

The 'rsl.err*' files are empty.

With the Option 2 though, I get "d01 2005-08-29_00:00:00 wrf: SUCCESS COMPLETE WRF".

I will greatly appreciate any help or suggestions. Thank you for your time.

Jyothi





       
---------------------------------
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20070721/f4577a65/attachment.html


More information about the Wrf-users mailing list