[ncl-talk] A Fatal Error Question: is this a bug in NCL?

Mary Haley haley at ucar.edu
Tue Jul 18 09:28:36 MDT 2017


Hi Barry,

We could provide you with a profiling version of NCL, in which you run your
script like normal, but then ncl produces a supplemental file with
information about how long certain code segments are taking. It breaks it
down by percentage of time spent in various sections.


This profiling information is a bit clunky because you have to run a script
on it and then view it from a browser, but these extra steps may be worth
it in order to find out where the code is slow.

To see a sample of how the profiler works, go to:

http://www.ncl.ucar.edu/Document/Tools/Profiler/multiple_plots/

and click on any one of the *.xml (the "
multiple_plots_ncl__multiple_plots.xml
<http://www.ncl.ucar.edu/Document/Tools/Profiler/multiple_plots/multiple_plots_ncl__multiple_plots.xml>"
is the main one and a good place to start)
​.  Look for lines that are colored
​
​in red, as these indicate where more time is being spent.
​
You can also use "get_cpu_time()" to track the timings yourself. For
example, if you want to time how long it takes to create each set of panel
plots, then you would have code that looks like this:

  start_time = get_cpu_time()
  plot(0) = gsn_csm_contour_map(wks,t_ave,res1)
  plot(1) = gsn_csm_contour_map(wks,rh_ave,res2)
  plotB   = gsn_csm_vector(wks,u_ave_new(::4,::4), \
                            v_ave_new(::4,::4),vecres)
  plotD   = gsn_csm_contour(wks,z_ave_new,res3)
  overlay(plot(0),plotB)
  overlay(plot(1),plotD)
  gsn_panel(wks,plot,(/2,1/),resP)    ; now draw as one plot
  end_time = get_cpu_time()
  print("Elapsed time for plotting = " + (end_time-start_time)

If you plan to do a lot of these timings, then I recommend a function.  :-)

procedure print_elapsed_time(stime,etime,title)
begin
  print("==================================================")
  print("Elapsed time for " + title + ": " + (etime-stime))
  print("==================================================")
end

Then your code would be:

  start_plot_time = get_cpu_time()
  plot(0) = gsn_csm_contour_map(wks,t_ave,res1)
  plot(1) = gsn_csm_contour_map(wks,rh_ave,res2)
  plotB   = gsn_csm_vector(wks,u_ave_new(::4,::4), \
                            v_ave_new(::4,::4),vecres)
  plotD   = gsn_csm_contour(wks,z_ave_new,res3)
  overlay(plot(0),plotB)
  overlay(plot(1),plotD)
  gsn_panel(wks,plot,(/2,1/),resP)    ; now draw as one plot
  end_plot_time = get_cpu_time()
  print_elapsed time(start_plot_time,end_plot_time,"Plotting")

Note that I changed the name from "start_time/end_time to
start_plot_time/end_plot_time. This is in case you do a bunch of these,
then you can print all the timings at the end, and it also helps keep track
of what timings these are associated with.

  start_read_time = get_cpu_time()
  .... Code that calls addfile and reads data
  end_read_time = get_cpu_time()
  ...
  start_ave_time = get_cpu_time()
  .... code that calls average_subset function a bunch of times
  end_ave_time = get_cpu_time()
  ...
  start_plot_time = get_cpu_time()
  ,,, code that creates and draws plots ...
  end_plot_time = get_cpu_time()

  print_elapsed time(start_read_time,end_read_time,"Reading data")
  print_elapsed time(start_ave_time,end_ave_time,"Averages")
  print_elapsed time(start_plot_time,end_plot_time,"Plotting")

Of course, you don't have to put all the elapsed timing prints at the end.
You can print them out right when the code is done, so you immediately have
the information.  It just depends on what you are interested in.

As a side, and Dave or Rick may want to correct me on my understanding on
this, but I think one thing that can make your code slow is that you have a
lot of print statements. Every time you do something like this:

   print("x = " + x)

versus this:

print(x)

I believe a string gets stored somewhere in memory. If you have a lot of
these, then NCL will get noticeably slower and slower, especially if you
are doing this inside do loops with a lot of iterations. I don't know if
you really have enough print statements for this to be a problem, but it's
worth making a mental note of this issue.

--Mary


On Mon, Jul 17, 2017 at 8:06 PM, Barry Lynn <barry.h.lynn at gmail.com> wrote:

> Dear Mary:
>
> Thank you. Very helpful!
>
> I had noticed that my code was running very, very slowly, but being
> relatively "new" to NCL programming was not sure why.  (Also, it was, as
> you noted, running out of memory since I "dialed back" the RAM on my
> account.)
>
> I will try your code and compare, and get back to the list about the
> timing differences.
>
> Barry
>
> On Tue, Jul 18, 2017 at 12:26 AM, Mary Haley <haley at ucar.edu> wrote:
>
>> Hi Barry,
>>
>> Rick is out this afternoon.
>>
>> Part of what Rick is suggesting is that if you are done with a variable,
>> then you should delete it in order to free up memory for later
>> calculations.
>>
>> If I may provide some honest comments about your script overall: if you
>> use some simple "clean coding practices", this will make it *much* easier
>> for you to debug your own code, and it will certainly make it easier for
>> other people to debug it. It will also help you see where you might have
>> some memory issues.
>>
>>
>> Here are some suggestions for improving your script:
>>
>> *[1] Use "delete" to remove variables you don't need any more:*
>>
>> t_ave=t_ave-273.16  ; convert to C
>>
>>         z_ave=z_ave/10.  ; convert to dam
>>
>>         printMinMax(z_ave,False)
>>         u = u * 3.6
>>         v = v * 3.6
>>         copy_VarCoords(u,u_ave)                         ; copy coord vars
>> to speed
>>         copy_VarCoords(v,v_ave)                         ; copy coord vars
>> to speed
>>         copy_VarCoords(z,z_ave)                         ; copy coord vars
>> to speed
>>         copy_VarCoords(t,t_ave)                         ; copy coord vars
>> to speed
>>         copy_VarCoords(rh,rh_ave)                         ; copy coord
>> vars to speed
>>
>> If you no longer need the "u", "v", "z", etc variables, then delete them:
>>
>> t_ave=t_ave-273.16  ; convert to C
>>
>>         z_ave=z_ave/10.  ; convert to dam
>>
>>         printMinMax(z_ave,False)
>>         u = u * 3.6
>>         v = v * 3.6
>>         copy_VarCoords(u,u_ave)                         ; copy coord vars
>> to speed
>>         copy_VarCoords(v,v_ave)                         ; copy coord vars
>> to speed
>>         copy_VarCoords(z,z_ave)                         ; copy coord vars
>> to speed
>>         copy_VarCoords(t,t_ave)
>>         copy_VarCoords(rh,rh_ave)
>>         delete([/u,v,z,t,rh/])                          ;  no longer
>> needed
>>
>> *[2] You are reading several variables and not doing anything with them
>> but printing them out. Reading them in uses memory. *
>>
>> If all you need to do is print them, then print them directly. An example:
>>
>>       lv0 = fin->lv_ISBL0
>>       print("lv0 = " + lv0)
>>       lv8 = fin->lv_ISBL8
>>       print("lv8 = " + lv8)
>>       lv10 = fin->lv_ISBL10
>>       print("lv10 = " + lv10)
>>
>> Do this instead, to save memory:
>>
>>       print("lv0 = " + fin->lv_ISBL0)
>>       print("lv8 = " + fin->lv_ISBL8)
>>       print("lv10 = " + fin->lv_ISBL10)
>>
>>
>> *[3] Avoid putting unchanged code inside do loops.*
>>
>> Your script has two do loops with a bunch of code inside of them that
>> doesn't change.
>>
>> Do loops can be slow, so it's important to keep as much code *outside* of
>> a do loop as possible.
>>
>> As an example, these are your two outside do loops:
>>
>>   do n=0,n_files -1,1
>>     do i_dir = 0,n_dirWRF-1
>>
>> Inside these two loops, you have code that looks like this:
>>
>>        res1                     = True
>>         res1 at mpDataBaseVersion = "Ncarg4_1"           ; choose more
>> recent database
>>         . . .
>>         res1 at mpNationalLineThicknessF    = 3.0
>>         res1 at mpGeophysicalLineThicknessF = 2.0
>>       . . .
>>
>>         res1 at tiMainString         = time_var
>>         res1 at gsnDraw             = False           ; don't draw
>>
>> res1 at cnLevelSelectionMode = "ExplicitLevels"    ; set explicit contour
>> levels
>>         . . .
>>         res1 at cnInfoLabelOn = False                  ; turn off contour
>> info label                                             res1 at lbAutoManage
>>          = False          ; control label bar
>>                         res1 at lbOrientation         = "Vertical"
>>         . . .
>>
>>         res1 at lbTopMarginF = 0.001
>>         res2                     = True
>>         res2                     = True
>>         .  . .
>>         res2 at mpNationalLineThicknessF    = 3.0
>>         res2 at mpGeophysicalLineThicknessF = 2.0
>>         res2 at tiMainString         = time_var
>>         res2 at gsnDraw             = False           ; don't draw
>>
>>         . . .
>>         res2 at lbOrientation         = "Vertical"
>>         res2 at pmLabelBarSide        = "Right"
>>         res2 at lbLabelAutoStride =   True
>>         res2 at pmLabelBarWidthF      = 0.20
>>         res2 at pmLabelBarHeightF     = 0.7
>>         . . .
>>
>> None of this code changes inside the do loops, but yet it is getting set
>> again and again for every iteration of the two do loops. In order to speed
>> things up, move as much code as you can to outside the do loop, and only
>> keep things that actually change inside the do loop.
>>
>> I recommend creating functions to set the various resource lists:
>> set_res1_list, set_res2_list, etc. This may not necessarily save memory,
>> but it makes your code much cleaner.
>>
>> *[4] Use functions for cleaner code.*
>>
>> You have code that is repeated in multiple locations.  This can cause
>> extra memory usage.  If you turn these repeated code segments into NCL
>> functions, then this can save some memory.
>>
>> An example, you have:
>>
>>   i_loc = 1
>>   j_loc = 1
>>   GET_IJ::get_ij(lat2d,lon2d,lat_beg,lon_beg,i_loc,j_loc,nj,ni)
>>   x_start = i_loc
>>   y_start = j_loc
>>
>> Granted, this doesn't use a lot of memory, but every time you copy these
>> 5 lines, it makes your script harder to read, and you will slowly use more
>> memory if you do this kind of thing frequently.
>>
>> Turn the above five lines into a function:
>>
>> function get_ij_ncl(lat2d,lon2d,lat_beg,lon_beg)
>> local nj, ni
>> begin
>>   i_loc = 1
>>   j_loc = 1
>>   nj = dimsizes(lat2d(:,0))
>>   ni = dimsizes(lat2d(0,:))
>>   GET_IJ::get_ij(lat2d,lon2d,lat_beg,lon_beg,i_loc,j_loc,nj,ni)
>>  return([/i_loc,j_loc/])
>> end
>>
>> and then call the function with:
>>
>> xy_start = get_ij_ncl(lat2d,lon2d,lat_beg,lon_beg)
>>
>> You will then need to change "x_start" to xy_start(0) and "y_start" to
>> xy_start(1)
>>
>> *[5] Another place where a function would be helpful:*
>>
>>   u_ave_new = u_ave(xy_start(1):xy_end(1),xy_start(0):xy_end(0))
>>   v_ave_new = v_ave(xy_start(1):xy_end(1),xy_start(0):xy_end(0))
>>   lat2d_new = lat2d(xy_start(1):xy_end(1),xy_start(0):xy_end(0))
>>   lon2d_new = lon2d(xy_start(1):xy_end(1),xy_start(0):xy_end(0))
>> ; plotB   = gsn_csm_vector(wks,u_ave_new,v_ave_new,vecres)
>>
>> ;;;opy_VarCoords(dumb_ave,u_ave_new)
>>
>> ; copy_VarCoords(dumb_ave,v_ave_new)
>>
>>    u_ave_new!0 = "lat"
>>    u_ave_new!1 = "lon"
>>    u_ave_new&lat= lat(xy_start(1):xy_end(1))
>>    u_ave_new&lon= lon(xy_start(0):xy_end(0))
>>    v_ave_new!0 = "lat"
>>    v_ave_new!1 = "lon"
>>    v_ave_new&lat= lat(xy_start(1):xy_end(1))
>>    v_ave_new&lon= lon(xy_start(0):xy_end(0))
>>
>> Create a function to do the subsetting and attaching of the metadata:
>>
>> function average_subset(x,lat,lon,xy_start,xy_end)
>> begin
>>   x_ave_new = x(xy_start(1):xy_end(1),xy_start(0):xy_end(0))
>>   x_ave_new!0 = "lat"
>>   x_ave_new!1 = "lon"
>>   x_ave_new&lat= lat(xy_start(1):xy_end(1))
>>   x_ave_new&lon= lon(xy_start(0):xy_end(0))
>>   return(x_ave_new)
>> end
>>
>> so now you can replace the above code with two lines:
>>
>>   u_ave_new = average_subset(u_ave,lat,lon,xy_start,xy_end)
>>   v_ave_new = average_subset(v_ave,lat,lon,xy_start,xy_end)
>>
>> *[6] You created "lat2d_new" and "lon2d_new" but never use them. Again,
>> this is using up memory for no purpose. Remove those two lines.*
>>
>> *[7] If you are propagating a smaller array to a larger array, as is the
>> case here:*
>>
>>   lat2d = z
>>   lon2d = z
>>   do j = 0,nj-1
>>   do i = 0,ni-1
>>    lat2d(j,i)=lat(j)
>>    lon2d(j,i)=lon(i)
>>   end do
>>   end do
>>
>> then use conform_dims instead of a do loop:
>>
>>   lat2d = conform_dims(dims2d,lat,0)
>>   lon2d = conform_dims(dims2d,lon,1)
>>
>> Note that I no longer need the "lat2d = z" or "lon2d = z" code.
>>
>> [8] You have this code, and I'm not sure why:
>>
>>       a = addfile(filename,"r")
>>       fin = a
>>
>> This seems like an unnecessary copy of "a".  Why not simply do:
>>
>>   fin = addfile(filename,"r")
>>
>> *[9] The "gsn_define_colormap" call should not be inside the do loop.
>> Call this right after you call gsn_open_wks.*
>>
>> *[10] There's some random code I don't understand. For example:*
>>
>>   copy_VarCoords(z,lat2d)
>>   copy_VarCoords(z,lon2d)
>>
>> lat2d and lon2d don't need to have z's coordinate arrays attached to them
>> for any reason that I can see, so I think you can remove those two lines.
>>
>> *[11] Every time you have a do loop or an if statement, indent all the
>> code inside *consistently* so you can more easily see what part of the code
>> you're in.*
>>
>> *[12] Don't make extra copies of variables unless you really need to.*
>>
>> For example, you have:
>>
>>       filename = all_files(n)
>>
>> There's really no need to create "filename" since you can just use
>> "all_files(n)".
>> I've tried to clean up your code to show you what I'm talking about. The
>> attached script likely won't run because I don't have your data to test
>> it.  But, hopefully you can see what I'm doing and mimic this in your code.
>>
>> --Mary
>>
>>
>>
>> On Mon, Jul 17, 2017 at 11:40 AM, Barry Lynn <barry.h.lynn at gmail.com>
>> wrote:
>>
>>> Hi Rick:
>>>
>>> Thank you for your suggestion.
>>>
>>> Could you please give me some guidelines on how to delete resources.
>>>
>>> I have included the script.  If you can provide an example, I can take
>>> it from there.
>>>
>>> Thank you,
>>>
>>> Barry
>>>
>>> On Mon, Jul 17, 2017 at 5:22 PM, Rick Brownrigg <brownrig at ucar.edu>
>>> wrote:
>>>
>>>> Hi Barry,
>>>>
>>>> On Linux, an errno=12 is an "out of memory" error.  I doubt that its
>>>> related directly to the systemfunc() call, but rather due to memory being
>>>> consumed by your script in previous iterations of the loop.  You might
>>>> check if there are variables that can be freed (i.e., deleted()) after each
>>>> iteration.
>>>>
>>>> HTH...
>>>> Rick
>>>>
>>>> On Sun, Jul 16, 2017 at 8:56 PM, Barry Lynn <barry.h.lynn at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi:
>>>>>
>>>>> I have a program that does multiple loops.
>>>>>
>>>>> I had this error:
>>>>>
>>>>> 0) i_dir = 6
>>>>>
>>>>> fatal:systemfunc: cannot create child process:[errno=12]
>>>>>
>>>>> fatal:["Execute.c":8640]:Execute: Error occurred at or near line 84
>>>>> in file ./plot_loop_700mb.ncl
>>>>>
>>>>>
>>>>> This occurred when reading the file from the sixth directory of 20.
>>>>>
>>>>>
>>>>> Here was the read from the 5th directory:
>>>>>
>>>>> (0) i_dir = 5
>>>>>
>>>>> (0) all_files(n) = /home/cust100021_vol1/barry/cu
>>>>> st100021_vol2/GEFS/GEFS_05/gep05.t00z.pgrb2b.0p50.f084.grb
>>>>>
>>>>> Here is a listing of both files:
>>>>>
>>>>> They are both present, but the second attempted allocation/definition
>>>>> of the file fails with the read that follows:
>>>>>
>>>>>
>>>>> [barry at cust100021-login1 GEFS]$ ls -ltr /home/cust100021_vol1/barry/cu
>>>>> st100021_vol2/GEFS/GEFS_05/gep05.t00z.pgrb2b.0p50.f084.grb
>>>>>
>>>>> -rw-r--r-- 1 barry cust100021 78985320 Jul 16 04:58
>>>>> /home/cust100021_vol1/barry/cust100021_vol2/GEFS/GEFS_05/gep
>>>>> 05.t00z.pgrb2b.0p50.f084.grb
>>>>>
>>>>> [barry at cust100021-login1 GEFS]$ ls -ltr /home/cust100021_vol1/barry/cu
>>>>> st100021_vol2/GEFS/GEFS_06/gep06.t00z.pgrb2b.0p50.f084.grb
>>>>>
>>>>> -rw-r--r-- 1 barry cust100021 80466702 Jul 16 04:58
>>>>> /home/cust100021_vol1/barry/cust100021_vol2/GEFS/GEFS_06/gep
>>>>> 06.t00z.pgrb2b.0p50.f084.grb
>>>>>
>>>>> Here is the read:
>>>>>
>>>>>
>>>>>   print("i_dir = " + i_dir)
>>>>>
>>>>>   diri = dirWRF(i_dir) + "/"
>>>>>
>>>>> ; define individual file read
>>>>>
>>>>>    all_files = systemfunc("ls " + diri + "ge*pgrb2b*grb")
>>>>>
>>>>>    print("all_files(n) = " + all_files(n))
>>>>>
>>>>>   filename = all_files(n)
>>>>>
>>>>>
>>>>> Is the problem that I make this allocation multiple times, each
>>>>> directory, each file? Perhaps I should read all files at once into a two
>>>>> dimensional filename array?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Barry H. Lynn, Ph.D
>>>>> Senior Lecturer,
>>>>> The Institute of the Earth Science,
>>>>> The Hebrew University of Jerusalem,
>>>>> Givat Ram, Jerusalem 91904, Israel
>>>>> Tel: 972 547 231 170
>>>>> Fax: (972)-25662581
>>>>>
>>>>> C.E.O, Weather It Is, LTD
>>>>> Weather and Climate Focus
>>>>> http://weather-it-is.com
>>>>> Jerusalem, Israel
>>>>> Local: 02 930 9525
>>>>> Cell: 054 7 231 170
>>>>> Int-IS: x972 2 930 9525
>>>>> US 914 432 3108 <(914)%20432-3108>
>>>>>
>>>>> _______________________________________________
>>>>> ncl-talk mailing list
>>>>> ncl-talk at ucar.edu
>>>>> List instructions, subscriber options, unsubscribe:
>>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Barry H. Lynn, Ph.D
>>> Senior Lecturer,
>>> The Institute of the Earth Science,
>>> The Hebrew University of Jerusalem,
>>> Givat Ram, Jerusalem 91904, Israel
>>> Tel: 972 547 231 170
>>> Fax: (972)-25662581
>>>
>>> C.E.O, Weather It Is, LTD
>>> Weather and Climate Focus
>>> http://weather-it-is.com
>>> Jerusalem, Israel
>>> Local: 02 930 9525
>>> Cell: 054 7 231 170
>>> Int-IS: x972 2 930 9525
>>> US 914 432 3108 <(914)%20432-3108>
>>>
>>> _______________________________________________
>>> ncl-talk mailing list
>>> ncl-talk at ucar.edu
>>> List instructions, subscriber options, unsubscribe:
>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>
>>>
>>
>
>
> --
> Barry H. Lynn, Ph.D
> Senior Lecturer,
> The Institute of the Earth Science,
> The Hebrew University of Jerusalem,
> Givat Ram, Jerusalem 91904, Israel
> Tel: 972 547 231 170
> Fax: (972)-25662581
>
> C.E.O, Weather It Is, LTD
> Weather and Climate Focus
> http://weather-it-is.com
> Jerusalem, Israel
> Local: 02 930 9525
> Cell: 054 7 231 170
> Int-IS: x972 2 930 9525
> US 914 432 3108 <(914)%20432-3108>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20170718/60849337/attachment.html 


More information about the ncl-talk mailing list