[Wrf-users] wrf.exe errors with RSL_LITE
Jyothi N
millyjy77 at yahoo.com
Sat Jul 21 08:19:14 MDT 2007
Dear WRF users, I need some input regarding some issues with running 'wrf.exe' on a HP XC opteron cluster with PGI and MPI compilers.
I get a core dump when I configure WRF with option 3, that is for RSL_LITE - 'PC Linux x86_64 (IA64 and Opteron), PGI 5.2 or higher DM-Parallel (RSL_
LITE, MPICH, Allows nesting, No periodic LBCs)'.
When I use the option 2 , 'PC Linux x86_64 (IA64 and Opteron), PGI 5.2 or higher, DM-Parallel (RSL,MPICH, Allows nesting)' - it works fine, and the run completes successfully.
I am concerned as the online tutorial seems to advise using RSL_LITE whenever possible for nested runs, which is what I am looking for.
Can someone help me in resolving this issue with RSL_LITE, and also let me know, if it doesn't really matter, using RSL, in place of RSL_LITE. Will there be any issues later on, if I use multiple nesting, has someone faced something similar? And why is it that the RSL_LITE preferable to RSL, for nested runs?
==========================================================
The following are the specifications of the errors with option 3 - the RSL_LITE option:
The test case is 'em_real'. Everything seems to work fine upto 'run.exe'. The final executable run gives a core dump. The following are the files i get:
-rw-r--r-- 1 jy cl 7076584 Jul 21 08:13 wrfinput_d01
-rw-r--r-- 1 jy cl 10056744 Jul 21 08:13 wrfbdy_d01
-rw-r--r-- 1 jy cl 7666844 Jul 21 2007 wrfout_d01_2005-08-28_00:00:00
-rw------- 1 jy cl 157868032 Jul 21 2007 core.30123
-rw------- 1 jy cl 140431360 Jul 21 2007 core.15189
-rw------- 1 jy cl 139616256 Jul 21 2007 core.14273
The output seems to stop writing to the 'wrfout*' file after a while, and the program crashes, as indicated by the size of the file. I get the following error:
starting wrf task 0 of 4
starting wrf task 3 of 4
starting wrf task 2 of 4
starting wrf task 1 of 4
srun: error: n51: task2: Segmentation fault (core dumped)
srun: Terminating job
srun: error: n49: task0: Exited with exit code 1
My stack size is unlimited. I've set the following env var:
export MP_STACK_SIZE=64000000, as well.
I am not expecting that there should be any CFL violations for this basic run, as the specifications are exactly as given in the online tutorial. And also since, it has worked fine for the option 2.
===========================================
The following is the 'tail -f' output of the first 'rsl.out.0000' file. The other three 'rsl.out' files are attached in entirety below:
-------------------------------------------------
'rsl.out.0000':
------------------
REAL_DATA_INIT_TYPE = 1
/
&GRIB2
BACKGROUND_PROC_ID = 255,
FORECAST_PROC_ID = 255,
PRODUCTION_STATUS = 255,
COMPRESSION = 40
/
Ntasks in X 2 , ntasks in Y 2
WRF V2.2 MODEL
*************************************
Parent domain
ids,ide,jds,jde 1 75 1 70
ims,ime,jms,jme -4 43 -4 41
ips,ipe,jps,jpe 1 37 1 35
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
med_initialdata_input: calling input_model_input
INPUT LANDUSE = USGS
LANDUSE TYPE = USGS FOUND 33 CATEGORIES 2 SEASONS
WATER CATEGORY = 16 SNOW CATEGORY = 24
STEPRA,STEPCU,STEPBL 10 2 1
Timing for Writing wrfout_d01_2005-08-28_00:00:00 for domain 1: 0.24800 elapsed seconds.
Timing for processing lateral boundary for domain 1: 0.07000 elapsed seconds.
WRF NUMBER OF TILES = 1
Timing for main: time 2005-08-28_00:03:00 on domain 1: 1.18500 elapsed seconds.
==================================================
The entire output from 'rsl.out.0001':
taskid: 1 hostname: n50
Quilting with 1 groups of 0 I/O tasks.
Ntasks in X 2 , ntasks in Y 2
WRF V2.2 MODEL
*************************************
Parent domain
ids,ide,jds,jde 1 75 1 70
ims,ime,jms,jme 32 80 -4 41
ips,ipe,jps,jpe 38 75 1 35
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
med_initialdata_input: calling input_model_input
INPUT LANDUSE = USGS
LANDUSE TYPE = USGS FOUND 33 CATEGORIES 2 SEASONS
WATER CATEGORY = 16 SNOW CATEGORY = 24
STEPRA,STEPCU,STEPBL 10 2 1
WRF NUMBER OF TILES = 1
====================================================
The entire output from 'rsl.out.0002':
taskid: 2 hostname: n51
Quilting with 1 groups of 0 I/O tasks.
Ntasks in X 2 , ntasks in Y 2
WRF V2.2 MODEL
*************************************
Parent domain
ids,ide,jds,jde 1 75 1 70
ims,ime,jms,jme -4 43 30 75
ips,ipe,jps,jpe 1 37 36 70
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
med_initialdata_input: calling input_model_input
INPUT LANDUSE = USGS
LANDUSE TYPE = USGS FOUND 33 CATEGORIES 2 SEASONS
WATER CATEGORY = 16 SNOW CATEGORY = 24
STEPRA,STEPCU,STEPBL 10 2 1
WRF NUMBER OF TILES = 1
====================================================
The entire output from 'rsl.out.0003':
taskid: 3 hostname: n53
Quilting with 1 groups of 0 I/O tasks.
Ntasks in X 2 , ntasks in Y 2
WRF V2.2 MODEL
*************************************
Parent domain
ids,ide,jds,jde 1 75 1 70
ims,ime,jms,jme 32 80 30 75
ips,ipe,jps,jpe 38 75 36 70
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
med_initialdata_input: calling input_model_input
INPUT LANDUSE = USGS
LANDUSE TYPE = USGS FOUND 33 CATEGORIES 2 SEASONS
WATER CATEGORY = 16 SNOW CATEGORY = 24
STEPRA,STEPCU,STEPBL 10 2 1
WRF NUMBER OF TILES = 1
======================================================
The 'rsl.err*' files are empty.
With the Option 2 though, I get "d01 2005-08-29_00:00:00 wrf: SUCCESS COMPLETE WRF".
I will greatly appreciate any help or suggestions. Thank you for your time.
Jyothi
---------------------------------
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20070721/f4577a65/attachment.html
More information about the Wrf-users
mailing list