[Wrf-users] WRF is "hanging"

Vassiliki Kotroni kotroni at meteo.noa.gr
Mon Mar 28 13:07:07 MDT 2011


Dear all

we recently had the same problem.

We had compiled mpi and wrf with the latest available version of intel

and when trying to run the model was hanging.

We found out that the problem was that we has only installed 64-bit intel

(as our system is 64-bit, an amd-phaenom) but indeed

installation of 32-bit on the same system was also needed.

Once we installed the 32-bit

without any recompilation the model was running OK.

Bizar but that is what happened to us.

 

best

Vasso

 

----------------------------------------------------------------------------
--------

Dr. Vassiliki KOTRONI

Institute of Environmental Research

National Observatory of Athens

Lofos Koufou, P. Pendeli, GR-15236

Athens, Greece

Tel: +30 2 10  8109126

Fax: +30 2 10 8103236

Daily weather forecasts at:

 <http://www.noa.gr/forecast> www.noa.gr/forecast (in english)

 <http://www.meteo.gr> www.meteo.gr       (in greek)

 <http://www.eurometeo.gr> www.eurometeo.gr

 

From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On
Behalf Of Don Morton
Sent: 28 March 2011 21:55
To: Jatin Kala
Cc: wrf-users at ucar.edu
Subject: Re: [Wrf-users] WRF is "hanging"

 

I have run into these kinds of issues a number of times.  In one case, it
was buggy implementation of MPI, in the scatterv() call, and switching to
openmpi fixed the problem.  In other cases, there were simply bad nodes on
the machine.  My own theory (may be completely wrong) is that these things
hangs very frequently occur while the master task is scattering stuff to all
the slaves.  This is seems to be a good operation for stressing MPI and/or
node communications.   I have found that these kinds of problems are often
(but not always) intermittent, and sometimes reducing the number of tasks
will get it running (presumably because you're not stressing the underlying
software and hardware infrastructure.

 

To date, I've never found these to be "WRF" problems.

 

Good luck!

 

Don

On Fri, Mar 25, 2011 at 11:19 PM, Jatin Kala <J.Kala at murdoch.edu.au> wrote:

Thanks for the suggestion Feng, but this is not related to namelist inputs.
The namelist I am running worked fine on  a different machine.

The issue here is that WRF simply hangs and does nothing at initialisation
of Grid 2. Ie, the rsl.out and rsl.error files print out:

 

d01 2009-10-01_00:00:00  alloc_space_field: domain            2,
84045408 b

 ytes allocated

 d01 2009-10-01_00:00:00  alloc_space_field: domain            2,
3084672 b

 ytes allocated

 d01 2009-10-01_00:00:00 *** Initializing nest domain # 2 from an input
file. **

 *

 d01 2009-10-01_00:00:00 med_initialdata_input: calling input_input

 

and that's it. The rsl.error and rsl.out files do not keep growing in size,
there are no more prints, they just stop printing stuff. The job however is
still in the queue and does NOT error out, until the walltime is elapsed. No
wrfout_d0* files are created. 

 

Other people seem to have had this issue before:

 

http://mailman.ucar.edu/pipermail/wrf-users/2010/001749.html 

 

http://mailman.ucar.edu/pipermail/wrf-users/2010/001747.html 

 

 

Any help more than welcome.

 

Regards,

 

Jatin 

 

 

 

From: Feng Liu [mailto:FLiu at azmag.gov] 
Sent: Saturday, 26 March 2011 9:04 AM
To: Jatin Kala; wrf-users at ucar.edu
Subject: RE: WRF is "hanging"

 

Hi Jatin,

I do not know exactly what is wrong for your case, but one thing you can try
is to reduce time_step in namelist.input by 3 times. Good luck.

Feng

 

 

From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On
Behalf Of Jatin Kala
Sent: Thursday, March 24, 2011 7:29 PM
To: wrf-users at ucar.edu
Subject: [Wrf-users] WRF is "hanging"

 

Dear WRF-users,

 

I have compiled WRF3.2 on our new supercomputing facility, and having some
trouble. Namely, WRF is just "hanging" at:

 

d01 2009-10-01_00:00:00  alloc_space_field: domain            2,
84045408 b

 ytes allocated

 d01 2009-10-01_00:00:00  alloc_space_field: domain            2,
3084672 b

 ytes allocated

 d01 2009-10-01_00:00:00 *** Initializing nest domain # 2 from an input
file. **

 *

 d01 2009-10-01_00:00:00 med_initialdata_input: calling input_input

 

 

The job remains in the queue, i.e, does not error out until walltime is
elapsed.

 

I have compiled with -O0 but that did not help. I have also compiled with
the updated "gen_allocs.c" form the WRF website, but that has not helped
either. I did do a "clean -a" before.

 

I have compiled WRF with the follows libs:

 

intel-compilers/2011.1.107

jasper/1.900.1

ncarg/5.2.1

mpi/intel/openmpi/1.4.2-qlc

netcdf/4.0.1/intel-2011.1.107

export WRFIO_NCD_LARGE_FILE_SUPPORT=1

 

Any help would be greatly appreciated!

 

Kind regards,

 

Jatin 


_______________________________________________
Wrf-users mailing list
Wrf-users at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/wrf-users




-- 

Voice:  907 450 8679

Arctic Region Supercomputing Center
http://weather.arsc.edu/

http://www.arsc.edu/~morton/

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20110328/6f088b99/attachment-0001.html 


More information about the Wrf-users mailing list