[Wrf-users] WRF is "hanging"

Don Morton Don.Morton at alaska.edu
Mon Mar 28 12:55:27 MDT 2011


I have run into these kinds of issues a number of times.  In one case, it
was buggy implementation of MPI, in the scatterv() call, and switching to
openmpi fixed the problem.  In other cases, there were simply bad nodes on
the machine.  My own theory (may be completely wrong) is that these things
hangs very frequently occur while the master task is scattering stuff to all
the slaves.  This is seems to be a good operation for stressing MPI and/or
node communications.   I have found that these kinds of problems are often
(but not always) intermittent, and sometimes reducing the number of tasks
will get it running (presumably because you're not stressing the underlying
software and hardware infrastructure.

To date, I've never found these to be "WRF" problems.

Good luck!

Don

On Fri, Mar 25, 2011 at 11:19 PM, Jatin Kala <J.Kala at murdoch.edu.au> wrote:

>  Thanks for the suggestion Feng, but this is not related to namelist
> inputs. The namelist I am running worked fine on  a different machine.
>
> The issue here is that WRF simply hangs and does nothing at initialisation
> of Grid 2. Ie, the rsl.out and rsl.error files print out:
>
>
>
> d01 2009-10-01_00:00:00  alloc_space_field: domain            2,
> 84045408 b
>
>  ytes allocated
>
>  d01 2009-10-01_00:00:00  alloc_space_field: domain            2,
> 3084672 b
>
>  ytes allocated
>
>  d01 2009-10-01_00:00:00 *** Initializing nest domain # 2 from an input
> file. **
>
>  *
>
>  d01 2009-10-01_00:00:00 med_initialdata_input: calling input_input
>
>
>
> and that’s it. The rsl.error and rsl.out files do not keep growing in size,
> there are no more prints, they just stop printing stuff. The job however is
> still in the queue and does NOT error out, until the walltime is elapsed. No
> wrfout_d0* files are created.
>
>
>
> Other people seem to have had this issue before:
>
>
>
> http://mailman.ucar.edu/pipermail/wrf-users/2010/001749.html
>
>
>
> http://mailman.ucar.edu/pipermail/wrf-users/2010/001747.html
>
>
>
>
>
> Any help more than welcome.
>
>
>
> Regards,
>
>
>
> Jatin
>
>
>
>
>
>
>
> *From:* Feng Liu [mailto:FLiu at azmag.gov]
> *Sent:* Saturday, 26 March 2011 9:04 AM
> *To:* Jatin Kala; wrf-users at ucar.edu
> *Subject:* RE: WRF is "hanging"
>
>
>
> Hi Jatin,
>
> I do not know exactly what is wrong for your case, but one thing you can
> try is to reduce time_step in namelist.input by 3 times. Good luck.
>
> Feng
>
>
>
>
>
> *From:* wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] *On
> Behalf Of *Jatin Kala
> *Sent:* Thursday, March 24, 2011 7:29 PM
> *To:* wrf-users at ucar.edu
> *Subject:* [Wrf-users] WRF is "hanging"
>
>
>
> Dear WRF-users,
>
>
>
> I have compiled WRF3.2 on our new supercomputing facility, and having some
> trouble. Namely, WRF is just “hanging” at:
>
>
>
> d01 2009-10-01_00:00:00  alloc_space_field: domain            2,
> 84045408 b
>
>  ytes allocated
>
>  d01 2009-10-01_00:00:00  alloc_space_field: domain            2,
> 3084672 b
>
>  ytes allocated
>
>  d01 2009-10-01_00:00:00 *** Initializing nest domain # 2 from an input
> file. **
>
>  *
>
>  d01 2009-10-01_00:00:00 med_initialdata_input: calling input_input
>
>
>
>
>
> The job remains in the queue, i.e, does not error out until walltime is
> elapsed.
>
>
>
> I have compiled with –O0 but that did not help. I have also compiled with
> the updated “gen_allocs.c” form the WRF website, but that has not helped
> either. I did do a “clean –a” before.
>
>
>
> I have compiled WRF with the follows libs:
>
>
>
> intel-compilers/2011.1.107
>
> jasper/1.900.1
>
> ncarg/5.2.1
>
> mpi/intel/openmpi/1.4.2-qlc
>
> netcdf/4.0.1/intel-2011.1.107
>
> export WRFIO_NCD_LARGE_FILE_SUPPORT=1
>
>
>
> Any help would be greatly appreciated!
>
>
>
> Kind regards,
>
>
>
> Jatin
>
> _______________________________________________
> Wrf-users mailing list
> Wrf-users at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/wrf-users
>
>


-- 
Voice:  907 450 8679
Arctic Region Supercomputing Center
http://weather.arsc.edu/
http://www.arsc.edu/~morton/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20110328/eceddcf4/attachment.html 


More information about the Wrf-users mailing list