[Wrf-users] Max number of CPUs for WRF
Alex Fierro
alexandre.o.fierro at gmail.com
Mon Mar 26 16:00:09 MDT 2012
Greetings:
I have ran WRF on the Oak Ridge Supercomputers (jaguarpf XT-5) on 2000
cores 2 years ago and ran into a similar problem, which in my case was
related to I/O quilting.
I had to select:
&namelist_quilt
nio_tasks_per_group =
2,
nio_groups = 1,
and then everything went fine (for that particular case).
The simulations was ran for a 24-h period on a 4-km convective permitting
grid over CONUS with a grid size of 1200 x 800 x 35. The simulation scaled
well up to 8000 cores where, again, I/O caused some issues. In the past, I
also believe Netcdf files had a hard-wired file size limit up to 2Gb (?)
similar to Vis5d files. Have you tried also using native (raw) binaries for
the output format?
Cheers and hope this helps,
Alexandre-
--
-------------------------------------------------------------
Alexandre Fierro, PhD
Research Scientist-
National Severe Storms Laboratory (NSSL/NOAA)
*The Cooperative Institute for Mesoscale Meteorological Studies* (OU/NOAA)
Los Alamos National Laboratory, Los Alamos, NM (LANL)
"Yesterday is History, Tomorrow is a Mystery and Today is a Gift; That is
why it is called the Present"
"There are only 10 types of people in the world:
Those who understand binary, and those who don't"
"My opinions are my own and not representative of OU, NSSL,
AOML, HRD, LANL or any affiliates."
^.^
(o o)
/( V )\
---m---m----
On Mon, Mar 26, 2012 at 7:22 AM, Don Morton <Don.Morton at alaska.edu> wrote:
> Howdy,
>
> I suspect you have over-decomposed your Nest 2.
>
> Your Nest 2 has 151x196 = 29,596 horizontal grid points. With 1152 tasks,
> each task only has about 26 grid points, or a 5x5 grid. At this level of
> refinement, I believe you're getting into issues of not having enough grid
> points for halos, etc.
>
> Actually, even with 224 cores, you only have about 132 grid points, or an
> 11x11 grid in each task. Some have suggested in the past that maybe once
> you get below about 15x15 grid points per task, your scalability starts to
> suffer.
>
> So, to re-answer your previous question, WRF will work with tens to
> hundreds of thousands of tasks, but you need to do this with sizable
> problems. You can only decompose a given problem size so much until
>
> a) It just doesn't scale well anymore, and
> b) You over-refine it so much that it won't even run. I suspect this is
> your problem with the 1152 tasks.
>
> Best Regards,
>
> Don Morton
>
>
> --
> Voice: +1 907 450 8679
> Arctic Region Supercomputing Center
> http://weather.arsc.edu/
> http://people.arsc.edu/~morton/ <http://www.arsc.edu/%7Emorton/>
>
>
> On Mon, Mar 26, 2012 at 8:42 AM, brick <brickflying at gmail.com> wrote:
>
>> Hi
>>
>> Thanks for help.
>> Today I test wrf3.3 with 224 cores. It goes well. But when I increase
>> cores to 1120, wrf.exe didn't integrate after 6 hours and it also didn't
>> stop or return any error massage.
>> The rsl.out.0000 show that wrf.exe stop at deal with domain2. Last 20
>> lines of rsl.out.0000 is shown here.
>>
>> 769 Timing for main: time 2012-03-22_05:57:30 on domain 1: 0.10120
>> elapsed seconds.
>> 770 Timing for main: time 2012-03-22_05:58:00 on domain 1: 0.10280
>> elapsed seconds.
>> 771 Timing for main: time 2012-03-22_05:58:30 on domain 1: 0.10070
>> elapsed seconds.
>> 772 Timing for main: time 2012-03-22_05:59:00 on domain 1: 0.10150
>> elapsed seconds.
>> 773 Timing for main: time 2012-03-22_05:59:30 on domain 1: 0.10080
>> elapsed seconds.
>> 774 Timing for main: time 2012-03-22_06:00:00 on domain 1: 0.09900
>> elapsed seconds.
>> 775 *************************************
>> 776 Nesting domain
>> 777 ids,ide,jds,jde 1 151 1 196
>> 778 ims,ime,jms,jme -4 15 -4 20
>> 779 ips,ipe,jps,jpe 1 5 1 6
>> 780 INTERMEDIATE domain
>> 781 ids,ide,jds,jde 243 278 150 194
>> 782 ims,ime,jms,jme 238 255 145 162
>> 783 ips,ipe,jps,jpe 241 245 148 152
>> 784 *************************************
>> 785 d01 2012-03-22_06:00:00 alloc_space_field: domain 2,
>> 786 18001632 bytes allocated
>> 787 d01 2012-03-22_06:00:00 alloc_space_field: domain 2,
>> 788 1941408 bytes allocated
>> 789 d01 2012-03-22_06:00:00 *** Initializing nest domain # 2 from an
>> input file. **
>> 790 *
>> 791 d01 2012-03-22_06:00:00 med_initialdata_input: calling input_input
>>
>> The namelist is :
>> 1 &time_control
>> 2 run_days = 0,
>> 3 run_hours = 72,
>> 4 run_minutes = 0,
>> 5 run_seconds = 0,
>> 6 start_year = 2012, 2012,
>> 7 start_month = 03, 03,
>> 8 start_day = 22, 22,
>> 9 start_hour = 00, 06,
>> 10 start_minute = 00, 00,
>> 11 start_second = 00, 00,
>> 12 end_year = 2012, 2012,
>> 13 end_month = 03, 03,
>> 14 end_day = 25, 23,
>> 15 end_hour = 00, 06,
>> 16 end_minute = 00, 00,
>> 17 end_second = 00, 00,
>> 18 interval_seconds = 21600,
>> 19 input_from_file = .true.,.true.,
>> 20 history_interval = 60, 60,
>> 21 frames_per_outfile = 13,13,
>> 22 restart = .false.,
>> 23 restart_interval = 36000,
>> 24 io_form_history = 2,
>> 25 io_form_restart = 2,
>> 26 io_form_input = 2,
>> 27 io_form_boundary = 2,
>> 28 debug_level = 0,
>> 29 /
>> 30
>> 31 &domains
>> 32 time_step = 30,
>> 33 time_step_fract_num = 0,
>> 34 time_step_fract_den = 1,
>> 35 max_dom = 2,
>> 36 s_we = 1, 1, 1,
>> 37 e_we = 441, 151,
>> 38 s_sn = 1, 1, 1,
>> 39 e_sn = 369, 196,
>> 40 s_vert = 1, 1, 1,
>> 41 e_vert = 51,51,
>> 42 p_top_requested = 5000,
>> 43 num_metgrid_levels = 27,
>> 44 num_metgrid_soil_levels = 4,
>> 45 dx = 5000, 1000,
>> 46 dy = 5000, 1000,
>> 47 grid_id = 1, 2, 3,
>> 48 parent_id = 0, 1, 2,
>> 49 i_parent_start = 0, 245,
>> 50 j_parent_start = 0, 152,
>> 51 parent_grid_ratio = 1, 5,
>> 52 parent_time_step_ratio = 1, 5,
>> 53 feedback = 0,
>> 54 smooth_option = 0,
>> 55 /
>> 56
>> 57 &physics
>> 58 mp_physics = 6,6,
>> 59 ra_lw_physics = 1, 1, 1,
>> 60 ra_sw_physics = 1, 1, 1,
>> 61 radt = 5,1,
>> 62 sf_sfclay_physics = 1,1,
>> 63 sf_surface_physics = 2, 2, 2,
>> 64 bl_pbl_physics = 1,1,
>> 65 bldt = 0, 0, 0,
>> 66 cu_physics = 0,0,
>> 67 cudt = 5, 5, 5,
>> 68 isfflx = 1,
>> 69 ifsnow = 0,
>> 70 icloud = 1,
>> 71 surface_input_source = 1,
>> 72 num_soil_layers = 4,
>> 73 sf_urban_physics = 0, 0, 0,
>> 74 /
>> 75
>> 76 &fdda
>> 77 /
>> 78
>> 79 &dynamics
>> 80 w_damping = 0,
>> 81 diff_opt = 1,
>> 82 km_opt = 4,
>> 83 diff_6th_opt = 0, 0, 0,
>> 84 diff_6th_factor = 0.12, 0.12, 0.12,
>> 85 base_temp = 290.,
>> 86 damp_opt = 1,
>> 87 zdamp = 5000,
>> 88 dampcoef = 0.01,
>> 89 khdif = 0, 0, 0,
>> 90 kvdif = 0, 0, 0,
>> 91 non_hydrostatic = .true., .true., .true.,
>> 92 moist_adv_opt = 1, 1, 1,
>> 93 scalar_adv_opt = 1, 1, 1,
>> 94 /
>> 95
>> 96 &bdy_control
>> 97 spec_bdy_width = 5,
>> 98 spec_zone = 1,
>> 99 relax_zone = 4,
>> 100 specified = .true., .false., .false.,
>> 101 nested = .false., .true., .true.,
>> 102 /
>> 103
>> 104 &grib2
>> 105 /
>> 106
>> 107 &namelist_quilt
>> 108 nio_tasks_per_group = 0,
>> 109 nio_groups = 1,
>> 110 imelist_quilt
>> 111 108 nio_tasks_per_group = 0,
>> 112 109 nio_groups = 1,
>> 113 110 /
>> 114
>>
>> Thanks a lot.
>>
>> brick
>>
>>
>>
>>
>> On Sat, Mar 24, 2012 at 12:39 AM, Welsh, Patrick T <pat.welsh at unf.edu>wrote:
>>
>>> It runs fine with hundreds, ok with thousands.
>>>
>>> Pat
>>>
>>>
>>>
>>> On 3/23/12 4:12 AM, "brick" <brickflying at gmail.com> wrote:
>>>
>>> Hi All
>>>
>>> Is there a limit of core number that WRF could use? I plan test WRF with
>>> 2048 cores or more next week. Could WRF run with such huge number?
>>> Thanks a lot.
>>>
>>> brick
>>>
>>>
>>> --
>>>
>>>
>>
>
>
>
> _______________________________________________
> Wrf-users mailing list
> Wrf-users at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/wrf-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/wrf-users/attachments/20120326/3fe4e8dd/attachment-0001.html
More information about the Wrf-users
mailing list