[Wrf-users] Max number of CPUs for WRF
Don Morton
Don.Morton at alaska.edu
Mon Mar 26 06:22:58 MDT 2012
Howdy,
I suspect you have over-decomposed your Nest 2.
Your Nest 2 has 151x196 = 29,596 horizontal grid points. With 1152 tasks,
each task only has about 26 grid points, or a 5x5 grid. At this level of
refinement, I believe you're getting into issues of not having enough grid
points for halos, etc.
Actually, even with 224 cores, you only have about 132 grid points, or an
11x11 grid in each task. Some have suggested in the past that maybe once
you get below about 15x15 grid points per task, your scalability starts to
suffer.
So, to re-answer your previous question, WRF will work with tens to
hundreds of thousands of tasks, but you need to do this with sizable
problems. You can only decompose a given problem size so much until
a) It just doesn't scale well anymore, and
b) You over-refine it so much that it won't even run. I suspect this is
your problem with the 1152 tasks.
Best Regards,
Don Morton
--
Voice: +1 907 450 8679
Arctic Region Supercomputing Center
http://weather.arsc.edu/
http://people.arsc.edu/~morton/ <http://www.arsc.edu/~morton/>
On Mon, Mar 26, 2012 at 8:42 AM, brick <brickflying at gmail.com> wrote:
> Hi
>
> Thanks for help.
> Today I test wrf3.3 with 224 cores. It goes well. But when I increase
> cores to 1120, wrf.exe didn't integrate after 6 hours and it also didn't
> stop or return any error massage.
> The rsl.out.0000 show that wrf.exe stop at deal with domain2. Last 20
> lines of rsl.out.0000 is shown here.
>
> 769 Timing for main: time 2012-03-22_05:57:30 on domain 1: 0.10120
> elapsed seconds.
> 770 Timing for main: time 2012-03-22_05:58:00 on domain 1: 0.10280
> elapsed seconds.
> 771 Timing for main: time 2012-03-22_05:58:30 on domain 1: 0.10070
> elapsed seconds.
> 772 Timing for main: time 2012-03-22_05:59:00 on domain 1: 0.10150
> elapsed seconds.
> 773 Timing for main: time 2012-03-22_05:59:30 on domain 1: 0.10080
> elapsed seconds.
> 774 Timing for main: time 2012-03-22_06:00:00 on domain 1: 0.09900
> elapsed seconds.
> 775 *************************************
> 776 Nesting domain
> 777 ids,ide,jds,jde 1 151 1 196
> 778 ims,ime,jms,jme -4 15 -4 20
> 779 ips,ipe,jps,jpe 1 5 1 6
> 780 INTERMEDIATE domain
> 781 ids,ide,jds,jde 243 278 150 194
> 782 ims,ime,jms,jme 238 255 145 162
> 783 ips,ipe,jps,jpe 241 245 148 152
> 784 *************************************
> 785 d01 2012-03-22_06:00:00 alloc_space_field: domain 2,
> 786 18001632 bytes allocated
> 787 d01 2012-03-22_06:00:00 alloc_space_field: domain 2,
> 788 1941408 bytes allocated
> 789 d01 2012-03-22_06:00:00 *** Initializing nest domain # 2 from an
> input file. **
> 790 *
> 791 d01 2012-03-22_06:00:00 med_initialdata_input: calling input_input
>
> The namelist is :
> 1 &time_control
> 2 run_days = 0,
> 3 run_hours = 72,
> 4 run_minutes = 0,
> 5 run_seconds = 0,
> 6 start_year = 2012, 2012,
> 7 start_month = 03, 03,
> 8 start_day = 22, 22,
> 9 start_hour = 00, 06,
> 10 start_minute = 00, 00,
> 11 start_second = 00, 00,
> 12 end_year = 2012, 2012,
> 13 end_month = 03, 03,
> 14 end_day = 25, 23,
> 15 end_hour = 00, 06,
> 16 end_minute = 00, 00,
> 17 end_second = 00, 00,
> 18 interval_seconds = 21600,
> 19 input_from_file = .true.,.true.,
> 20 history_interval = 60, 60,
> 21 frames_per_outfile = 13,13,
> 22 restart = .false.,
> 23 restart_interval = 36000,
> 24 io_form_history = 2,
> 25 io_form_restart = 2,
> 26 io_form_input = 2,
> 27 io_form_boundary = 2,
> 28 debug_level = 0,
> 29 /
> 30
> 31 &domains
> 32 time_step = 30,
> 33 time_step_fract_num = 0,
> 34 time_step_fract_den = 1,
> 35 max_dom = 2,
> 36 s_we = 1, 1, 1,
> 37 e_we = 441, 151,
> 38 s_sn = 1, 1, 1,
> 39 e_sn = 369, 196,
> 40 s_vert = 1, 1, 1,
> 41 e_vert = 51,51,
> 42 p_top_requested = 5000,
> 43 num_metgrid_levels = 27,
> 44 num_metgrid_soil_levels = 4,
> 45 dx = 5000, 1000,
> 46 dy = 5000, 1000,
> 47 grid_id = 1, 2, 3,
> 48 parent_id = 0, 1, 2,
> 49 i_parent_start = 0, 245,
> 50 j_parent_start = 0, 152,
> 51 parent_grid_ratio = 1, 5,
> 52 parent_time_step_ratio = 1, 5,
> 53 feedback = 0,
> 54 smooth_option = 0,
> 55 /
> 56
> 57 &physics
> 58 mp_physics = 6,6,
> 59 ra_lw_physics = 1, 1, 1,
> 60 ra_sw_physics = 1, 1, 1,
> 61 radt = 5,1,
> 62 sf_sfclay_physics = 1,1,
> 63 sf_surface_physics = 2, 2, 2,
> 64 bl_pbl_physics = 1,1,
> 65 bldt = 0, 0, 0,
> 66 cu_physics = 0,0,
> 67 cudt = 5, 5, 5,
> 68 isfflx = 1,
> 69 ifsnow = 0,
> 70 icloud = 1,
> 71 surface_input_source = 1,
> 72 num_soil_layers = 4,
> 73 sf_urban_physics = 0, 0, 0,
> 74 /
> 75
> 76 &fdda
> 77 /
> 78
> 79 &dynamics
> 80 w_damping = 0,
> 81 diff_opt = 1,
> 82 km_opt = 4,
> 83 diff_6th_opt = 0, 0, 0,
> 84 diff_6th_factor = 0.12, 0.12, 0.12,
> 85 base_temp = 290.,
> 86 damp_opt = 1,
> 87 zdamp = 5000,
> 88 dampcoef = 0.01,
> 89 khdif = 0, 0, 0,
> 90 kvdif = 0, 0, 0,
> 91 non_hydrostatic = .true., .true., .true.,
> 92 moist_adv_opt = 1, 1, 1,
> 93 scalar_adv_opt = 1, 1, 1,
> 94 /
> 95
> 96 &bdy_control
> 97 spec_bdy_width = 5,
> 98 spec_zone = 1,
> 99 relax_zone = 4,
> 100 specified = .true., .false., .false.,
> 101 nested = .false., .true., .true.,
> 102 /
> 103
> 104 &grib2
> 105 /
> 106
> 107 &namelist_quilt
> 108 nio_tasks_per_group = 0,
> 109 nio_groups = 1,
> 110 imelist_quilt
> 111 108 nio_tasks_per_group = 0,
> 112 109 nio_groups = 1,
> 113 110 /
> 114
>
> Thanks a lot.
>
> brick
>
>
>
>
> On Sat, Mar 24, 2012 at 12:39 AM, Welsh, Patrick T <pat.welsh at unf.edu>wrote:
>
>> It runs fine with hundreds, ok with thousands.
>>
>> Pat
>>
>>
>>
>> On 3/23/12 4:12 AM, "brick" <brickflying at gmail.com> wrote:
>>
>> Hi All
>>
>> Is there a limit of core number that WRF could use? I plan test WRF with
>> 2048 cores or more next week. Could WRF run with such huge number?
>> Thanks a lot.
>>
>> brick
>>
>>
>> --
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20120326/f2e17d1d/attachment.html
More information about the Wrf-users
mailing list