[Wrf-users] WRF jobs stopping suddenly at 14:00 on some random days, even though it works for other pseudo-identical runs?

Maxime Colin m.colin at unsw.edu.au
Tue Jun 6 00:05:17 MDT 2017


Dear WRF users,


I am doing a couple of experiments that are all identical (using exactly the same wrf.exe, the same namelist, the same time frame) except that I restart from different restart files (which I modify myself from original wrfrst files). I had 12 identical experiments in total, and 2 experiments that suddenly stopped after 2007-11-25_14:00:00 because of this:


----------------------------------------------

 d01 2007-11-25_14:00:00  MMINLU error on input
            2 input_wrf: wrf_get_next_time current_date: 2007-11-25_14:00:00 Sta
 tus =          -11
 -------------- FATAL CALLED ---------------
 FATAL CALLED FROM FILE:  <stdin>  LINE:     895
  ... Could not find matching time in input file ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
 -------------------------------------------


So I decided to run these 2 experiments again. One finished without any problem. The second stopped suddenly during wrfout_d01_2007-11-19, for the same reason as above, just on a different day.

So I decided to run again the last experiment. It stopped after 2007-11-22_14:00:00 because the same thing again:

---------------------------------------------
 d01 2007-11-22_14:00:00  MMINLU error on input
            2 input_wrf: wrf_get_next_time current_date: 2007-11-22_14:00:00 Sta
 tus =          -11
 -------------- FATAL CALLED ---------------
 FATAL CALLED FROM FILE:  <stdin>  LINE:     895
  ... Could not find matching time in input file ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
 -------------------------------------------

According to Google, it seems that other people have solved the issue by adding "ulimit -s unlimited" or "limit stacksize unlimited". But this is already in my run_mpi script which launches the model. Other people suggested that it would be related to the maximum number of output time allowed (10000 in my simulation). I don't believe in this second argument because I'm doing exactly the same kind of experiments about 12 independent times, and it only failed with the last 2 ones.

Do you have any thought on why that might be?

At the moment, I'm re-running it again... We'll see how random the whole thing is. Even though, it doesn't always seem to be on the same day, 14:00 seems to be the time when it fails.

I attach a few relevant files.

Thank you,

Maxime



Maxime Colin
---------------------------------------------
PhD candidate
Climate Change Research Centre & ARC Centre of Excellence for Climate System Science, UNSW, Australia
and Laboratoire de Météorologie Dynamique, UPMC, France
http://www.ccrc.unsw.edu.au/ccrc-team/students/maxime-colin
http://www.climatescience.org.au/staff/profile/mcolin
---------------------------------------------
+61 (0)421 620 779    /    +33 (0)6 25 57 81 93
m.colin at unsw.edu.au    /     colinmaxime at hotmail.fr<https://www.normalesup.org/phare/squirrelmail/src/compose.php?send_to=maxime.colin%40normalesup.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: namelist.input
Type: application/octet-stream
Size: 5390 bytes
Desc: namelist.input
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0006.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run_wrf.py
Type: text/x-python
Size: 2216 bytes
Desc: run_wrf.py
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0001.py 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run_mpi
Type: application/octet-stream
Size: 455 bytes
Desc: run_mpi
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0007.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run_mpi.o5466016
Type: application/octet-stream
Size: 878 bytes
Desc: run_mpi.o5466016
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0008.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wrf_mpi.out
Type: application/octet-stream
Size: 266429 bytes
Desc: wrf_mpi.out
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0009.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rsl.error.0000
Type: application/octet-stream
Size: 459194 bytes
Desc: rsl.error.0000
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0010.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rsl.out.0000
Type: application/octet-stream
Size: 1459387 bytes
Desc: rsl.out.0000
Url : http://mailman.ucar.edu/pipermail/wrf-users/attachments/20170606/b241a380/attachment-0011.obj 


More information about the Wrf-users mailing list