[Wrf-users] WRF 3.2 jobs hanging up sporadically on wrfoutoutput

Zulauf, Michael Michael.Zulauf at iberdrolausa.com
Wed May 5 11:06:34 MDT 2010


Thanks to all who have responded.  I received an answer off-list from
wrfhelp, and I'm testing a code modification they sent.  So far it
appears to be working, but I need to do some more testing (as the
problem seems to be sporadic on my system).

Once I know more, I'll report back more completely to the list and
wrfhelp.

BTW, Feng, I was about to test your suggestion when I got the new code
from wrfhelp.  I may give your suggestion a try if their code doesn't
work.

Thanks again. . .


-----Original Message-----
From: Feng Liu [mailto:fliu at mag.maricopa.gov] 
Sent: Tuesday, May 04, 2010 9:30 AM
To: Zulauf, Michael; wrf-users at ucar.edu
Subject: RE: [Wrf-users] WRF 3.2 jobs hanging up sporadically on
wrfoutoutput

Mike,
Remove fine_input_stream  = 0, 2, 2, 2,  in your namelist and then try
again. I had the same experience. The namelist worked well with WRF3.1.1
does not work with WRF3.2. Honestly I do not know why.
Feng
 

-----Original Message-----
From: wrf-users-bounces at ucar.edu [mailto:wrf-users-bounces at ucar.edu] On
Behalf Of Zulauf, Michael
Sent: Friday, April 30, 2010 12:01 PM
To: wrf-users at ucar.edu
Subject: Re: [Wrf-users] WRF 3.2 jobs hanging up sporadically on
wrfoutoutput

Hi again, all. . .

I'm reviving my plea for help from a couple weeks ago.  I'm still having
issues with WRF 3.2 - and _only_ 3.2.

I've tried different versions of the PGI compilers, different versions
of support libraries, different optimization levels (all the way down to
none), etc.  My jobs sporadically (but usually eventually) hang up, most
often after a new wrfout file is opened.  No error messages, no crashes
- the processes continue, but _all_ output stops.  I eventually just
have to kill the job.  The wrfouts are small, and all output looks good
up until the failed wrfout.

The exact same hardware, OS, compilers, libraries, etc work for previous
versions of WRF.

Below is an example namelist.input (WPS seems to be running fine).  Any
thoughts?

Thanks,
Mike

------------------------------------------------------------------------
----------------------------
&time_control
 run_days                            = 0,
 run_hours                           = 24,
 run_minutes                         = 0,
 run_seconds                         = 0,
 start_year                          = 2009,2009,2009,2009,
 start_month                         = 12,12,12,12,
 start_day                           = 14,14,14,14,
 start_hour                          = 00,03,06,09,
 start_minute                        = 00,   00,   00,   00,   00,   00,
 start_second                        = 00,   00,   00,   00,   00,   00,

 end_year                            = 2009,2009,2009,2009,
 end_month                           = 12,12,12,12,
 end_day                             = 15,15,15,14,
 end_hour                            = 00,00,00,12,
 end_minute                          = 00,   00,   00,   00,   00,   00,
 end_second                          = 00,   00,   00,   00,   00,   00,
 interval_seconds                    = 10800,
 input_from_file                     =
.true.,.true.,.true.,.true.,.true.,
 fine_input_stream                   = 0, 2, 2, 2, 
 io_form_auxinput2                   = 2
 history_interval                    = 60,60,60,20,
 frames_per_outfile                  =  1,  1,  1,  1,  1,  1, 
 restart                             = .false.,
 restart_interval                    = 1440,
 io_form_history                     = 2
 io_form_restart                     = 2
 io_form_input                       = 2
 io_form_boundary                    = 2
 debug_level                         = 0
 adjust_output_times                 = .true.
 /

 &domains
 time_step                           = 163,
 time_step_fract_num                 = 7,
 time_step_fract_den                 = 11,
 max_dom                             = 4,
 s_we                                = 1,  1,  1,  1,  1, 1,
 e_we                                =   142,244,280,382,
 s_sn                                =  1,  1,  1,  1,  1, 1,
 e_sn                                =   154,268,250,196,
 s_vert                              =  1,  1,  1,  1,  1, 1,
 e_vert                              = 31,  31,  31,  31,  31, 31,
 num_metgrid_levels                  =  27 ,
 eta_levels                          = 1.000, 0.993, 0.980, 0.966,
0.950, 0.933, 0.913, 0.892, 0.869, 0.844, 0.816, 0.786, 0.753, 0.718,
0.680, 0.639, 0.596, 0.550, 0.501, 0.451, 0.398, 0.345, 0.290, 0.236,
0.188, 0.145, 0.108, 0.075, 0.046, 0.021, 0.000,

 p_top_requested                     = 5000,
 dx                                  = 27000,9000,3000,1000,
 dy                                  = 27000,9000,3000,1000,
 grid_id        = 1,  2,  3,  4,  5,  6,
 parent_id      = 1,  1,  2,  3,  4,  5,
 i_parent_start                      =   1,31,91,92,
 j_parent_start                      =   1,33,93,93,
 parent_grid_ratio = 1,  3,  3,  3,  3,  3,
 parent_time_step_ratio = 1,  3,  3,  3,  3, 3,
 feedback                            = 0,
 smooth_option                       = 2
 use_adaptive_time_step              = .false.
 step_to_output_time                 = .true.
 target_cfl                          = 1.1,1.1,1.1,1.1,
 max_step_increase_pct               = 5, 51, 51, 51, 51, 51
 starting_time_step                  = 162, 54, 18, 6
 max_time_step                       = 202.5, 67.5, 22.5, 7.5
 min_time_step                       = 27, 9, 3, 1
 adaptation_domain                   = 4
 /

 &physics
 mp_physics                          = 5, 5, 5, 5, 
 ra_lw_physics                       = 1, 1, 1, 1, 
 ra_sw_physics                       = 1, 1, 1, 1, 
 radt                                = 30,    30,    30,    30,    30,
30,
 sf_sfclay_physics                   = 1, 1, 1, 1, 
 sf_surface_physics                  = 1, 1, 1, 1, 
 bl_pbl_physics                      = 1, 1, 1, 1, 
 bldt                                = 0,     0,     0,     0,     0,
0,
 cu_physics                          = 1,     1,     0,     0,     0,
0,
 cudt                                = 5,     5,     5,     0,     0,
0, 
 cam_abs_freq_s                      = 21600,
 levsiz                              = 59,
 paerlev                             = 29,
 cam_abs_dim1                        = 4,
 cam_abs_dim2                        = 31,
 isfflx                              = 1,
 ifsnow                              = 0,
 icloud                              = 1,
 surface_input_source                = 1,
 num_soil_layers                     = 5,
 sf_urban_physics                    = 0,     0,     0,     0,
 mp_zero_out                         = 0,
 maxiens                             = 1,
 maxens                              = 3,
 maxens2                             = 3,
 maxens3                             = 16,
 ensdim                              = 144,
 slope_rad                           = 0,
 topo_shading                        = 0,
 /

 &fdda
 grid_fdda                           = 1,     0,     0,
 gfdda_inname                        = "wrffdda_d<domain>",
 gfdda_interval_m                    = 180,   0,     0,
 gfdda_end_h                         = 12,    0,     0,
 io_form_gfdda                       = 2,
 fgdt                                = 0,     0,     0,
 if_no_pbl_nudging_uv                = 0,     0,     0,
 if_no_pbl_nudging_t                 = 1,     0,     0,
 if_no_pbl_nudging_q                 = 1,     0,     0,
 if_zfac_uv                          = 0,     0,     0,
  k_zfac_uv                          = 10,   10,    10,
 if_zfac_t                           = 1,     0,     0,
  k_zfac_t                           = 10,   10,    10,
 if_zfac_q                           = 1,     0,     0,
  k_zfac_q                           = 10,   10,    10,
 guv                                 = 0.0001,     0.0001,     0.0001,
 gt                                  = 0.0001,     0.0001,     0.0001,
 gq                                  = 0.000001,   0.000001,   0.000001,
 if_ramping                          = 0,
 dtramp_min                          = 0.0,
/

 &dynamics
 w_damping                           = 1,
 diff_opt                            = 1,
 km_opt                              = 4,
 diff_6th_opt                        = 0,
 diff_6th_factor                     = 0.12,
 base_temp                           = 290.
 damp_opt                            = 0,
 zdamp                               = 5000.,  5000.,  5000.,
 dampcoef                            = 0.01,   0.01,   0.01
 khdif                               = 0,      0,      0,
 kvdif                               = 0,      0,      0,
 non_hydrostatic                     = .true., .true., .true.,
 moist_adv_opt                       = 1,      1,      1,     1
 scalar_adv_opt                      = 1,      1,      1,     1
 use_baseparam_fr_nml                = .true.
 /

 &bdy_control
 spec_bdy_width                      = 5,
 spec_zone                           = 1,
 relax_zone                          = 4,
 specified                           = .true.,
.false.,.false.,.false.,.false., .false.,
 nested                              = .false., .true., .true.,.true.,
.true., .true.,
 /

 &grib2
 /

 &namelist_quilt
 nio_tasks_per_group = 0,
 nio_groups = 1,
 /
------------------------------------------------------------------------
----------------------------

-----Original Message-----
Date: Fri, 16 Apr 2010 10:11:22 -0700
From: "Zulauf, Michael" <Michael.Zulauf at iberdrolausa.com>
Subject: Re: [Wrf-users] WRF 3.2 jobs hanging up sporadically on
	wrfout	output
To: "Don Morton" <Don.Morton at alaska.edu>
Cc: wrf-users at ucar.edu
Message-ID:
	
<B2A259FAA3CF26469FF9A7C7402C49970913EFE0 at POREXUW03.ppmenergy.us>
Content-Type: text/plain; charset="us-ascii"

Thanks for the response, Don.  

The specific RDMA suggestion isn't relevant to our case (our hardware
doesn't support it), but you may be right that this is an optimizations
related issue.  I'll probably try playing with optimizations next.  I've
got the same settings as has worked for previous versions - but perhaps
something in the new code has made one of the settings problematic.

Regarding the suggestions I've been getting relating to
WRFIO_NCD_LARGE_FILE_SUPPORT - I don't think that's the problem.  I'm
splitting my output into single frame files to keep the file size small.
I may try that also, just for the heck of it.

Based on the sporadic nature of this (sometimes it happens, sometimes it
doesn't, when it hangs seems fairly random), I suspect it's some type of
timing issue like a race condition.  If I can't get it working, I may
just drop back to 3.1.1, at least until 3.2.1 comes out.  ;-)

Thanks all,

Mike





This message is intended for the exclusive attention of the address(es)
indicated.  Any information contained herein is strictly confidential
and privileged, especially as regards person data, 
which must not be disclosed.  If you are the intended recipient and have
received it by mistake or learn about it in any other way, please notify
us by return e-mail and delete this message from
 your computer system. Any unauthorized use, reproduction, alteration,
filing or sending of this message and/or any attached files to third
parties may lead to legal proceedings being taken. Any 
opinion expressed herein is solely that of the author(s) and does not
necessarily represent the opinion of Iberdrola. The sender does not
guarantee the integrity, speed or safety of this 
message, not accept responsibility for any possible damage arising from
the interception, incorporation of virus or any other manipulation
carried out by third parties.

_______________________________________________
Wrf-users mailing list
Wrf-users at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/wrf-users



More information about the Wrf-users mailing list