[Wrf-users] WRF jobs stopping suddenly at 14:00 on some random days, even though it works for other pseudo-identical runs?
Maxime Colin
m.colin at unsw.edu.au
Tue Jun 6 00:05:17 MDT 2017
Dear WRF users,
I am doing a couple of experiments that are all identical (using exactly the same wrf.exe, the same namelist, the same time frame) except that I restart from different restart files (which I modify myself from original wrfrst files). I had 12 identical experiments in total, and 2 experiments that suddenly stopped after 2007-11-25_14:00:00 because of this:
----------------------------------------------
d01 2007-11-25_14:00:00 MMINLU error on input
2 input_wrf: wrf_get_next_time current_date: 2007-11-25_14:00:00 Sta
tus = -11
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 895
... Could not find matching time in input file ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
-------------------------------------------
So I decided to run these 2 experiments again. One finished without any problem. The second stopped suddenly during wrfout_d01_2007-11-19, for the same reason as above, just on a different day.
So I decided to run again the last experiment. It stopped after 2007-11-22_14:00:00 because the same thing again:
---------------------------------------------
d01 2007-11-22_14:00:00 MMINLU error on input
2 input_wrf: wrf_get_next_time current_date: 2007-11-22_14:00:00 Sta
tus = -11
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 895
... Could not find matching time in input file ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
-------------------------------------------
According to Google, it seems that other people have solved the issue by adding "ulimit -s unlimited" or "limit stacksize unlimited". But this is already in my run_mpi script which launches the model. Other people suggested that it would be related to the maximum number of output time allowed (10000 in my simulation). I don't believe in this second argument because I'm doing exactly the same kind of experiments about 12 independent times, and it only failed with the last 2 ones.
Do you have any thought on why that might be?
At the moment, I'm re-running it again... We'll see how random the whole thing is. Even though, it doesn't always seem to be on the same day, 14:00 seems to be the time when it fails.
I attach a few relevant files.
Thank you,
Maxime
Maxime Colin
---------------------------------------------
PhD candidate
Climate Change Research Centre & ARC Centre of Excellence for Climate System Science, UNSW, Australia
and Laboratoire de Météorologie Dynamique, UPMC, France
http://www.ccrc.unsw.edu.au/ccrc-team/students/maxime-colin
http://www.climatescience.org.au/staff/profile/mcolin
---------------------------------------------
+61 (0)421 620 779 / +33 (0)6 25 57 81 93
m.colin at unsw.edu.au / colinmaxime at hotmail.fr<https://www.normalesup.org/phare/squirrelmail/src/compose.php?send_to=maxime.colin%40normalesup.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: namelist.input
Type: application/octet-stream
Size: 5390 bytes
Desc: namelist.input
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0006.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run_wrf.py
Type: text/x-python
Size: 2216 bytes
Desc: run_wrf.py
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0001.py
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run_mpi
Type: application/octet-stream
Size: 455 bytes
Desc: run_mpi
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0007.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run_mpi.o5466016
Type: application/octet-stream
Size: 878 bytes
Desc: run_mpi.o5466016
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0008.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wrf_mpi.out
Type: application/octet-stream
Size: 266429 bytes
Desc: wrf_mpi.out
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0009.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rsl.error.0000
Type: application/octet-stream
Size: 459194 bytes
Desc: rsl.error.0000
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0010.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rsl.out.0000
Type: application/octet-stream
Size: 1459387 bytes
Desc: rsl.out.0000
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0011.obj
More information about the Wrf-users
mailing list