[ncl-talk] memory intensive script

Sat Jul 23 17:20:46 MDT 2016

It's pretty straightforward but you have to watch your FillValues and zero
values.
This is for a case where 0 is not a valid data value. Otherwise you need a
3rd array to track those.

Let's say you have a do loop where you read each time step you want to
average over:

Full data array X(time,nlat,nlon). You could read in all the data and then
xavg = dim_avg_n_Wrap(X,(/0/))
but this is slow and memory intensive. Also you can track the sample size
for each data point.

My suggestion is faster and requires far less memory. !!CAVE!! make sure
you look at your data and know the valid ranges and invalid data. All of
these must be accounted for before doing the cumulative averaging.

a = addle(datafile,"r")

1.) Create an array to carry the cumulative sum and another to carry the
counts:
   do timestep=0,ntime
   mh := a->X(timestep,:,:)
   if(timestep.eq.0) then ; do only once
      dims  = dimsizes(mhm)
      out   = new(dims,"float")
      outc = out
      outc = 0.0
      out = 0.0
   endif
   ; do the cumulative averaging
   out = out + where(.not.ismissing(mhs),mhs,0.0) ; make sure your
fillvalue is set properly.
   outc = outc + where(.not.ismissing(mhs),1.0,0.0)
  end do

2.) Set 0 values to FillValue to avoid divide by 0. If 0 is valid they will
not add to the cumulative sum but some locations might
be 0 all the time and should not be replaced by a fillvalue as in the case
below. In such cases, valid locations with 0 must be tracked with a unique
array out_zero, e.g.

outc = where(outc.lt.1.0,default_fillvalue("float"),out) ; 0 is not a valid
data value
ret = out/outc ; no divide by 0

3.) Replace Fillvalue with 0 if needed.

  copy_VarMeta(mhs,ret)

Hope this helps,
/M

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Marston S. Johnston, PhD
Department of Earth Sciences
University of Gothenburg, Sweden
Email: marston.johnston at gu.se
Phone: +46-31-7864901
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Only the fruitful thing is true!

On Sat, Jul 23, 2016 at 6:30 PM, adamrhster . <adamrhster at gmail.com> wrote:

> Thanks for the suggestions. I get your 2nd point, but am having trouble
> following your 1st point about manually computing the average. Possible to
> give a quick example?
>
> Adam
>
> On Sat, Jul 23, 2016 at 10:18 AM, Marston Johnston <shejo284 at gmail.com>
> wrote:
>
>> Hi Adam,
>>
>> There are a couple of things I can see right off the bat.
>>
>> 1.) I always average large files by reading in reading in the dimension I
>> want to average over
>> 1 at a time and then sum them, create counts and then divide. It is a bit
>> tricky to get right, but it is faster than dim_avg.
>> Use 0 to hold places where the data is not valid. So you will need a
>> value array and a count array. Before dividing,
>> set the count positions that are 0 to a fillvalue. This will save memory
>> and time. This method would affect parts of the code that are similar to:
>>
>> in  = addfiles (files0,"r")
>>   ListSetType(in,"cat")
>>
>>   lat0 = in[0]->lat
>>   lon0 = in[0]->lon
>>   hyam = in[0]->hyam ; read from a file the mid-layer coef
>>   hybm = in[0]->hybm ; read from a file
>>   hyai = in[0]->hyai ; read from a file the interface-layer coef
>>   hybi = in[0]->hybi ; read from a file
>>   nlat = dimsizes(lat0)
>>   nlon = dimsizes(lon0)
>>   nlevs = dimsizes(hyam)
>>
>>   pc = in[:]->PRECC
>>   pl = in[:]->PRECL
>>
>>   delete(in)
>>
>>   LHl0 = pl*L*rhofw
>>   LHc0 = pc*L*rhofw
>>   delete(pc)
>>   delete(pl)
>>   zlhl0 = dim_avg(dim_avg_n(LHl0,0))
>>   zlhc0 = dim_avg(dim_avg_n(LHc0,0))
>>
>>
>> 2.) You only need to addfile(file,"r") and then loop over the history,
>> reading in it 1 at a time so:
>>
>> in  = addfile(files0(np),"r")
>> ps0 = in->PS   ; surface pressure [Pa]
>>
>> becomes
>>
>> in  = addfile(ifile,"r")
>>
>> do np = 0, nh.
>>     ps0 = in->PS(np,:,:)   ; surface pressure [Pa]
>> end do
>>
>> Hope this helps,
>> /Marston
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Marston S. Johnston, PhD
>> Department of Earth Sciences
>> University of Gothenburg, Sweden
>> Email: marston.johnston at gu.se
>> Phone: +46-31-7864901
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Only the fruitful thing is true!
>>
>> On Sat, Jul 23, 2016 at 1:11 AM, Adam Herrington <
>> adam.herrington at stonybrook.edu> wrote:
>>
>>> Hi all,
>>>
>>> Im getting a core-dump from an ncl script that I'm running on
>>> yellowstone. This is probably due to memory issues. The core-dump occurs
>>> after i load the very costly 28km global simulations. My usual remedy is to
>>> loop through each history file individually (there are ~60 history files
>>> for each model run) instead of using "addfiles", which reduces the size of
>>> the variables.
>>>
>>> Unfortunately, I'm still getting the core-dump. I've attached my script,
>>> and would appreciate any suggestions on ways to cut down on memory.
>>>
>>> I've never tried splitting the variables into smaller chunks, but I
>>> guess I will start trying to do this. Unless someone has a better
>>> suggestion?
>>>
>>> Thanks!
>>>
>>> Adam
>>>
>>> _______________________________________________
>>> ncl-talk mailing list
>>> ncl-talk at ucar.edu
>>> List instructions, subscriber options, unsubscribe:
>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20160724/6ff032e2/attachment.html