[Wrf-users] WRF is "hanging"
Vassiliki Kotroni
kotroni at meteo.noa.gr
Mon Mar 28 13:07:07 MDT 2011
Dear all
we recently had the same problem.
We had compiled mpi and wrf with the latest available version of intel
and when trying to run the model was hanging.
We found out that the problem was that we has only installed 64-bit intel
(as our system is 64-bit, an amd-phaenom) but indeed
installation of 32-bit on the same system was also needed.
Once we installed the 32-bit
without any recompilation the model was running OK.
Bizar but that is what happened to us.
best
Vasso
----------------------------------------------------------------------------
--------
Dr. Vassiliki KOTRONI
Institute of Environmental Research
National Observatory of Athens
Lofos Koufou, P. Pendeli, GR-15236
Athens, Greece
Tel: +30 2 10 8109126
Fax: +30 2 10 8103236
Daily weather forecasts at:
<http://www.noa.gr/forecast> www.noa.gr/forecast (in english)
<http://www.meteo.gr> www.meteo.gr (in greek)
<http://www.eurometeo.gr> www.eurometeo.gr
From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On
Behalf Of Don Morton
Sent: 28 March 2011 21:55
To: Jatin Kala
Cc: wrf-users at ucar.edu
Subject: Re: [Wrf-users] WRF is "hanging"
I have run into these kinds of issues a number of times. In one case, it
was buggy implementation of MPI, in the scatterv() call, and switching to
openmpi fixed the problem. In other cases, there were simply bad nodes on
the machine. My own theory (may be completely wrong) is that these things
hangs very frequently occur while the master task is scattering stuff to all
the slaves. This is seems to be a good operation for stressing MPI and/or
node communications. I have found that these kinds of problems are often
(but not always) intermittent, and sometimes reducing the number of tasks
will get it running (presumably because you're not stressing the underlying
software and hardware infrastructure.
To date, I've never found these to be "WRF" problems.
Good luck!
Don
On Fri, Mar 25, 2011 at 11:19 PM, Jatin Kala <J.Kala at murdoch.edu.au> wrote:
Thanks for the suggestion Feng, but this is not related to namelist inputs.
The namelist I am running worked fine on a different machine.
The issue here is that WRF simply hangs and does nothing at initialisation
of Grid 2. Ie, the rsl.out and rsl.error files print out:
d01 2009-10-01_00:00:00 alloc_space_field: domain 2,
84045408 b
ytes allocated
d01 2009-10-01_00:00:00 alloc_space_field: domain 2,
3084672 b
ytes allocated
d01 2009-10-01_00:00:00 *** Initializing nest domain # 2 from an input
file. **
*
d01 2009-10-01_00:00:00 med_initialdata_input: calling input_input
and that's it. The rsl.error and rsl.out files do not keep growing in size,
there are no more prints, they just stop printing stuff. The job however is
still in the queue and does NOT error out, until the walltime is elapsed. No
wrfout_d0* files are created.
Other people seem to have had this issue before:
http://mailman.ucar.edu/pipermail/wrf-users/2010/001749.html
http://mailman.ucar.edu/pipermail/wrf-users/2010/001747.html
Any help more than welcome.
Regards,
Jatin
From: Feng Liu [mailto:FLiu at azmag.gov]
Sent: Saturday, 26 March 2011 9:04 AM
To: Jatin Kala; wrf-users at ucar.edu
Subject: RE: WRF is "hanging"
Hi Jatin,
I do not know exactly what is wrong for your case, but one thing you can try
is to reduce time_step in namelist.input by 3 times. Good luck.
Feng
From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On
Behalf Of Jatin Kala
Sent: Thursday, March 24, 2011 7:29 PM
To: wrf-users at ucar.edu
Subject: [Wrf-users] WRF is "hanging"
Dear WRF-users,
I have compiled WRF3.2 on our new supercomputing facility, and having some
trouble. Namely, WRF is just "hanging" at:
d01 2009-10-01_00:00:00 alloc_space_field: domain 2,
84045408 b
ytes allocated
d01 2009-10-01_00:00:00 alloc_space_field: domain 2,
3084672 b
ytes allocated
d01 2009-10-01_00:00:00 *** Initializing nest domain # 2 from an input
file. **
*
d01 2009-10-01_00:00:00 med_initialdata_input: calling input_input
The job remains in the queue, i.e, does not error out until walltime is
elapsed.
I have compiled with -O0 but that did not help. I have also compiled with
the updated "gen_allocs.c" form the WRF website, but that has not helped
either. I did do a "clean -a" before.
I have compiled WRF with the follows libs:
intel-compilers/2011.1.107
jasper/1.900.1
ncarg/5.2.1
mpi/intel/openmpi/1.4.2-qlc
netcdf/4.0.1/intel-2011.1.107
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
Any help would be greatly appreciated!
Kind regards,
Jatin
_______________________________________________
Wrf-users mailing list
Wrf-users at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/wrf-users
--
Voice: 907 450 8679
Arctic Region Supercomputing Center
http://weather.arsc.edu/
http://www.arsc.edu/~morton/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20110328/6f088b99/attachment-0001.html
More information about the Wrf-users
mailing list