[Wrf-users] WRF 3.2 jobs hanging up sporadically on wrfout output

Zulauf, Michael Michael.Zulauf at iberdrolausa.com
Thu Apr 15 15:49:40 MDT 2010


Hi all,

I'm trying to get WRF V3.2 running by utilizing a setup that I've
successfully run with V3.1.1 (and earlier).  The configure/compile
seemed to go fine using the same basic configuration details that have
worked in the past.  When I look over the Updates in V3.2, I don't see
anything problematic for me.

We're running with four grids, nesting from 27km to 1km, initialized and
forced with GFS output.  The nest initializations are delayed from the
outer grid initialization by 3, 6, and 9 hours, respecitively.  The 1km
grid has wrfout (netcdf) output every 20 minutes, the other grids every
hour.

What I'm seeing is that the job appears to be running fine for some
time, but eventually the job hangs up during wrfout output - usually on
the finest grid - but not exclusively.  Changing small details (such as
changing restart_interval) can make it run longer or shorter.  Sometimes
even with no changes it will run a different length of time.

I've got debug_level set to 300, so I get tons of output.  When it
hangs, the wrf process don't die, but all output stops.  There are no
error messages or anything else that indicate a problem (at least none
that I can find).  What I do get is a truncated (always 32 byte) wrfout
file.  For example:

-rw-r--r--  1 p20457 staff 32 Apr 15 13:02
wrfout_d04_2009-12-14_09:00:00

The wrfout's that get written before it hangs appear to be fine, with
valid data.  frames_per_outfile is set to 1, so the files never get
excessively large - maybe on the order of 175MB.  All of the previous
versions of WRF that I've used continue work fine on this hardware/OS
combination (a cluster of dual-dual core Opterons, running CentOS) -
just V3.2 has issues.

Like I said, the wrf processes don't die, but all output ceases, even
with the massive amount of debug info.  The last lines in the rsl.error
and rsl.out files is always something of this type:

  date 2009-12-14_09:00:00
  ds             1            1            1
  de             1            1            1
  ps             1            1            1
  pe             1            1            1
  ms             1            1            1
  me             1            1            1
  output_wrf.b writing 0d real

The specific times and and variables being written vary, depending on
when the job hangs.

I haven't dug deeply into what's going on, but it seems like possibly
some sort of race condition or communications deadlock or something.
Does anybody have ideas of where I should go from here?  It seems to me
like maybe something basic has changed with V3.2, and perhaps I need to
adjust something in my configuration or setup.

Thanks,
Mike

-- 
Mike Zulauf
Meteorologist
Wind Asset Management 
Iberdrola Renewables
1125 NW Couch, Suite 700
Portland, OR 97209
Office: 503-478-6304  Cell: 503-913-0403





This message is intended for the exclusive attention of the address(es) indicated.  Any information contained herein is strictly confidential and privileged, especially as regards person data, 
which must not be disclosed.  If you are the intended recipient and have received it by mistake or learn about it in any other way, please notify us by return e-mail and delete this message from
 your computer system. Any unauthorized use, reproduction, alteration, filing or sending of this message and/or any attached files to third parties may lead to legal proceedings being taken. Any 
opinion expressed herein is solely that of the author(s) and does not necessarily represent the opinion of Iberdrola. The sender does not guarantee the integrity, speed or safety of this 
message, not accept responsibility for any possible damage arising from the interception, incorporation of virus or any other manipulation carried out by third parties.



More information about the Wrf-users mailing list