[Wrf-users] WRF 3.2 jobs hanging up sporadically on wrfout output

Zulauf, Michael Michael.Zulauf at iberdrolausa.com
Fri Apr 30 13:01:29 MDT 2010


Hi again, all. . .

I'm reviving my plea for help from a couple weeks ago.  I'm still having
issues with WRF 3.2 - and _only_ 3.2.

I've tried different versions of the PGI compilers, different versions
of support libraries, different optimization levels (all the way down to
none), etc.  My jobs sporadically (but usually eventually) hang up, most
often after a new wrfout file is opened.  No error messages, no crashes
- the processes continue, but _all_ output stops.  I eventually just
have to kill the job.  The wrfouts are small, and all output looks good
up until the failed wrfout.

The exact same hardware, OS, compilers, libraries, etc work for previous
versions of WRF.

Below is an example namelist.input (WPS seems to be running fine).  Any
thoughts?

Thanks,
Mike

------------------------------------------------------------------------
----------------------------
&time_control
 run_days                            = 0,
 run_hours                           = 24,
 run_minutes                         = 0,
 run_seconds                         = 0,
 start_year                          = 2009,2009,2009,2009,
 start_month                         = 12,12,12,12,
 start_day                           = 14,14,14,14,
 start_hour                          = 00,03,06,09,
 start_minute                        = 00,   00,   00,   00,   00,   00,
 start_second                        = 00,   00,   00,   00,   00,   00,

 end_year                            = 2009,2009,2009,2009,
 end_month                           = 12,12,12,12,
 end_day                             = 15,15,15,14,
 end_hour                            = 00,00,00,12,
 end_minute                          = 00,   00,   00,   00,   00,   00,
 end_second                          = 00,   00,   00,   00,   00,   00,
 interval_seconds                    = 10800,
 input_from_file                     =
.true.,.true.,.true.,.true.,.true.,
 fine_input_stream                   = 0, 2, 2, 2, 
 io_form_auxinput2                   = 2
 history_interval                    = 60,60,60,20,
 frames_per_outfile                  =  1,  1,  1,  1,  1,  1, 
 restart                             = .false.,
 restart_interval                    = 1440,
 io_form_history                     = 2
 io_form_restart                     = 2
 io_form_input                       = 2
 io_form_boundary                    = 2
 debug_level                         = 0
 adjust_output_times                 = .true.
 /

 &domains
 time_step                           = 163,
 time_step_fract_num                 = 7,
 time_step_fract_den                 = 11,
 max_dom                             = 4,
 s_we                                = 1,  1,  1,  1,  1, 1,
 e_we                                =   142,244,280,382,
 s_sn                                =  1,  1,  1,  1,  1, 1,
 e_sn                                =   154,268,250,196,
 s_vert                              =  1,  1,  1,  1,  1, 1,
 e_vert                              = 31,  31,  31,  31,  31, 31,
 num_metgrid_levels                  =  27 ,
 eta_levels                          = 1.000, 0.993, 0.980, 0.966,
0.950, 0.933, 0.913, 0.892, 0.869, 0.844, 0.816, 0.786, 0.753, 0.718,
0.680, 0.639, 0.596, 0.550, 0.501, 0.451, 0.398, 0.345, 0.290, 0.236,
0.188, 0.145, 0.108, 0.075, 0.046, 0.021, 0.000,

 p_top_requested                     = 5000,
 dx                                  = 27000,9000,3000,1000,
 dy                                  = 27000,9000,3000,1000,
 grid_id        = 1,  2,  3,  4,  5,  6,
 parent_id      = 1,  1,  2,  3,  4,  5,
 i_parent_start                      =   1,31,91,92,
 j_parent_start                      =   1,33,93,93,
 parent_grid_ratio = 1,  3,  3,  3,  3,  3,
 parent_time_step_ratio = 1,  3,  3,  3,  3, 3,
 feedback                            = 0,
 smooth_option                       = 2
 use_adaptive_time_step              = .false.
 step_to_output_time                 = .true.
 target_cfl                          = 1.1,1.1,1.1,1.1,
 max_step_increase_pct               = 5, 51, 51, 51, 51, 51
 starting_time_step                  = 162, 54, 18, 6
 max_time_step                       = 202.5, 67.5, 22.5, 7.5
 min_time_step                       = 27, 9, 3, 1
 adaptation_domain                   = 4
 /

 &physics
 mp_physics                          = 5, 5, 5, 5, 
 ra_lw_physics                       = 1, 1, 1, 1, 
 ra_sw_physics                       = 1, 1, 1, 1, 
 radt                                = 30,    30,    30,    30,    30,
30,
 sf_sfclay_physics                   = 1, 1, 1, 1, 
 sf_surface_physics                  = 1, 1, 1, 1, 
 bl_pbl_physics                      = 1, 1, 1, 1, 
 bldt                                = 0,     0,     0,     0,     0,
0,
 cu_physics                          = 1,     1,     0,     0,     0,
0,
 cudt                                = 5,     5,     5,     0,     0,
0, 
 cam_abs_freq_s                      = 21600,
 levsiz                              = 59,
 paerlev                             = 29,
 cam_abs_dim1                        = 4,
 cam_abs_dim2                        = 31,
 isfflx                              = 1,
 ifsnow                              = 0,
 icloud                              = 1,
 surface_input_source                = 1,
 num_soil_layers                     = 5,
 sf_urban_physics                    = 0,     0,     0,     0,
 mp_zero_out                         = 0,
 maxiens                             = 1,
 maxens                              = 3,
 maxens2                             = 3,
 maxens3                             = 16,
 ensdim                              = 144,
 slope_rad                           = 0,
 topo_shading                        = 0,
 /

 &fdda
 grid_fdda                           = 1,     0,     0,
 gfdda_inname                        = "wrffdda_d<domain>",
 gfdda_interval_m                    = 180,   0,     0,
 gfdda_end_h                         = 12,    0,     0,
 io_form_gfdda                       = 2,
 fgdt                                = 0,     0,     0,
 if_no_pbl_nudging_uv                = 0,     0,     0,
 if_no_pbl_nudging_t                 = 1,     0,     0,
 if_no_pbl_nudging_q                 = 1,     0,     0,
 if_zfac_uv                          = 0,     0,     0,
  k_zfac_uv                          = 10,   10,    10,
 if_zfac_t                           = 1,     0,     0,
  k_zfac_t                           = 10,   10,    10,
 if_zfac_q                           = 1,     0,     0,
  k_zfac_q                           = 10,   10,    10,
 guv                                 = 0.0001,     0.0001,     0.0001,
 gt                                  = 0.0001,     0.0001,     0.0001,
 gq                                  = 0.000001,   0.000001,   0.000001,
 if_ramping                          = 0,
 dtramp_min                          = 0.0,
/

 &dynamics
 w_damping                           = 1,
 diff_opt                            = 1,
 km_opt                              = 4,
 diff_6th_opt                        = 0,
 diff_6th_factor                     = 0.12,
 base_temp                           = 290.
 damp_opt                            = 0,
 zdamp                               = 5000.,  5000.,  5000.,
 dampcoef                            = 0.01,   0.01,   0.01
 khdif                               = 0,      0,      0,
 kvdif                               = 0,      0,      0,
 non_hydrostatic                     = .true., .true., .true.,
 moist_adv_opt                       = 1,      1,      1,     1
 scalar_adv_opt                      = 1,      1,      1,     1
 use_baseparam_fr_nml                = .true.
 /

 &bdy_control
 spec_bdy_width                      = 5,
 spec_zone                           = 1,
 relax_zone                          = 4,
 specified                           = .true.,
.false.,.false.,.false.,.false., .false.,
 nested                              = .false., .true., .true.,.true.,
.true., .true.,
 /

 &grib2
 /

 &namelist_quilt
 nio_tasks_per_group = 0,
 nio_groups = 1,
 /
------------------------------------------------------------------------
----------------------------

-----Original Message-----
Date: Fri, 16 Apr 2010 10:11:22 -0700
From: "Zulauf, Michael" <Michael.Zulauf at iberdrolausa.com>
Subject: Re: [Wrf-users] WRF 3.2 jobs hanging up sporadically on
	wrfout	output
To: "Don Morton" <Don.Morton at alaska.edu>
Cc: wrf-users at ucar.edu
Message-ID:
	
<B2A259FAA3CF26469FF9A7C7402C49970913EFE0 at POREXUW03.ppmenergy.us>
Content-Type: text/plain; charset="us-ascii"

Thanks for the response, Don.  

The specific RDMA suggestion isn't relevant to our case (our hardware
doesn't support it), but you may be right that this is an optimizations
related issue.  I'll probably try playing with optimizations next.  I've
got the same settings as has worked for previous versions - but perhaps
something in the new code has made one of the settings problematic.

Regarding the suggestions I've been getting relating to
WRFIO_NCD_LARGE_FILE_SUPPORT - I don't think that's the problem.  I'm
splitting my output into single frame files to keep the file size small.
I may try that also, just for the heck of it.

Based on the sporadic nature of this (sometimes it happens, sometimes it
doesn't, when it hangs seems fairly random), I suspect it's some type of
timing issue like a race condition.  If I can't get it working, I may
just drop back to 3.1.1, at least until 3.2.1 comes out.  ;-)

Thanks all,

Mike





This message is intended for the exclusive attention of the address(es) indicated.  Any information contained herein is strictly confidential and privileged, especially as regards person data, 
which must not be disclosed.  If you are the intended recipient and have received it by mistake or learn about it in any other way, please notify us by return e-mail and delete this message from
 your computer system. Any unauthorized use, reproduction, alteration, filing or sending of this message and/or any attached files to third parties may lead to legal proceedings being taken. Any 
opinion expressed herein is solely that of the author(s) and does not necessarily represent the opinion of Iberdrola. The sender does not guarantee the integrity, speed or safety of this 
message, not accept responsibility for any possible damage arising from the interception, incorporation of virus or any other manipulation carried out by third parties.



More information about the Wrf-users mailing list