[ncl-talk] Slow code

Dennis Shea shea at ucar.edu
Fri Jan 15 15:42:07 MST 2016


This is a long and rather detailed post which I have worked upon on-and-off
over the past day. The basic message is:  Don't use tools (operators)
without looking at your data.
-----
Appolinaire's suggestion follows the general approach I mentioned in a
previous ncl-talk response on this thread:

  [1] Process each year-month file containing hourly values:
        ACCESS_SRF.1980010100.nc; ACCESS_SRF.1980020100.nc, .....
  [2] Write a netCDF file containing daily means( with the time dimension
written as 'unlimited') for each year-month file
  [3] After [2], concatenate the daily mean files ('cdo mergetime' or  the
NCO 'ncrcat' operator)

I wrote a sample NCL driver script to do this sequence (see attached:
tst.cdo_daymean.ncl). Prior to running the NCL driver script, I looked at
the  number of time steps on the two sample year-month files that Mike sent.

The ACCESS_SRF.1980010100.nc file has 744 time (hourly) values:
       744/24= 31
which is expected for January.

The ACCESS_SRF.1980020100.nc <http://ACCESS_SRF.1980010100.nc> file has 696
time (hourly) values:
       696/24= 29
which is expected for February in a leap year.

The total number of daily means should be 60 (=31+29). However, when I ran
the tst.cdo_daymean.ncl script  on my two sample files I got:

========================
%> ncl tst.cdo_daymean.ncl

 Copyright (C) 1995-2015 - All Rights Reserved
 University Corporation for Atmospheric Research
 NCAR Command Language Version 6.3.0
 The use of this software is governed by a License Agreement.
 See http://www.ncl.ucar.edu/ for more details.


Variable: fili
Type: string
Total Size: 16 bytes
            2 values
Number of Dimensions: 1
Dimensions and sizes:    [2]
Coordinates:
(0)    ACCESS_SRF.1980010100.nc
(1)    ACCESS_SRF.1980020100.nc
(0)    ---------
(0)    cdo daymean ./ACCESS_SRF.1980010100.nc ./daily/ACCESS_SRF.19800101.nc
Warning (cdfSetVar) : Inconsistent variable definition for xlat!
Warning (cdfSetVar) : Inconsistent variable definition for xlon!
cdo daymean: Processed 614693730 values from 27 variables over 744
timesteps ( 8.13s )

(0)    cdo daymean ./ACCESS_SRF.1980020100.nc ./daily/ACCESS_SRF.19800201.nc
Warning (cdfSetVar) : Inconsistent variable definition for xlat!
Warning (cdfSetVar) : Inconsistent variable definition for xlon!
cdo daymean: Processed 575040018 values from 27 variables over 696
timesteps ( 7.57s )

(0)    =========
(0)    PATHO=./daily/ACCESS_SRF.19800101-19800201.nc
(0)    cdo_mergetim: cdo mergetime ./daily/ACCESS_SRF*nc ./daily/
ACCESS_SRF.19800101-19800201.nc
cdo mergetime: Processed 51341766 values from 54 variables over 62
timesteps ( 3.19s )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
======
Note: **  62 ** time steps when 60 are expected.
(FYI: NCL's 6.3.1 'calculate_daily_values' function returns 62 also!).

Are both CDO *and* NCL 'wrong' ? The short answer is "no" !   Pourquoi??
Why??

Unfortunately, the way the source files were created presents a problem for
this generic one month at a time approach.  :-(
It assumes that each day on the file has 24 hourly values for each day.
=================
As is frequently mentioned on ncl-talk, the golden rule of data processing
is *look at your data*

   diri = "./"
   fili = "ACCESS_SRF.1980010100.nc"   ; hourly
   f    = addfile(diri+fili, "r")
   time = f->time
   date = cd_calendar(time, -3)        ; yyyymmddhh
   print(date)

===
Variable: date
Type: integer
Total Size: 2976 bytes
            744 values
Number of Dimensions: 1
Dimensions and sizes:   [744]
Coordinates:
Number Of Attributes: 1
  calendar :    gregorian

(0)     1980010101              <=== starts at hour 1 (hh=1)   ... not hour
00 as one might expect
(1)     1980010102
(2)     1980010103
[snip]
(21)    1980010122
(22)    1980010123            <=== day 01 ends here ... *** 23 hours ***
for day 1
(23)    1980010200            <=== day 02 starts at 00
(24)    1980010201
[snip]
(45)    1980010222
(46)    1980010223            <=== day 02 end at 23  *** 24 hours *** for
day 2
(47)    1980010300            <=== day 02 starts at 00
(48)    1980010301
[SNIP]
     days 2-31 have 24 value
[SNIP]
(719)   1980013100
(720)   1980013101
(721)   1980013102
[snip]
(741)   1980013122
(742)   1980013123
(Jan & Feb)
(743)   1980020100           *** This is where the 32nd day come from!!   1
hour at February 1 at 00Z
^^^^^^^^^^^^^^^^^^^

So, 32 daily means with the  daily mean for Jan 1, calculated from 23
values; The last daily mean is Feb 1 at 00Z with one value. All other
values would have 24 hourly values used.

===========
The resultant merged file will have ***two***  Feb 1 means

[snip]
(29)    1980013023
(30)    1980013123
(31)    1980020100 <========  one value at 00Z from the 1st file
(32)    1980020123 <======== derived from 23 hourly values on the 2nd file
[snip]

(741)   1980013122
(742)   1980013123
(743)   1980020100     <========  one value

Punch line: This will likely create issues with subsequent post-processing.




On Tue, Jan 12, 2016 at 11:44 PM, Appo derbetini <appopson4 at gmail.com>
wrote:

> Hi,
> To convert hourly to daily data, I'm using CDO
>
> For january 1980 for example,
>
> cdo daymean  hourly_01.1980.nc     daily_01.1980.nc
>
> Then I merge monthly files to have yearly data
>
>
> with
>
>
> cdo mergetime  daily_*.1980.nc    daily.1980.nc
>
>
>
> Merci
>
> Appolinaire
>
> 2016-01-12 21:47 GMT+01:00 Dennis Shea <shea at ucar.edu>:
>
>> At my request, Mike sent me two sample files. I will look when I get a
>> chance.
>>
>> 'My' strategy when dealing with multiple files is
>>
>> [1]
>> Will use of 'addfiles' be advantageous for the problem.? if hourly arrays
>> are 20GB ... the answer is likely 'no'. Memory allocation, especially on
>> multi user systems) can be slow.
>>
>> [2]
>> Most commonly
>>
>>   [a] I will process one year at a time, converting year, month, day, hr
>> to (say) 'hours since ...'.
>>   [b] Compute the daily averages; write each year or month to netCDF with
>> 'time' unlimited
>>   [c] use the 'ncks' or 'ncrcat' operators to combine the files if that
>> is desirable.
>>
>> ---
>> I am attaching the 'contrbuted.ncl' functions that are used to calculate
>> monthly or daily quantities: avg, sum, min, max
>>
>>
>>
>>
>> http://www.ncl.ucar.edu/Document/Functions/Contributed/calculate_daily_values.shtml
>>
>> http://www.ncl.ucar.edu/Document/Functions/Contributed/calculate_monthly_values.shtml
>>
>> D
>>
>> On Tue, Jan 12, 2016 at 12:48 PM, Michael Notaro <mnotaro at wisc.edu>
>> wrote:
>>
>>> Thanks, Mary, for your further suggestions.
>>>
>>>
>>> After I got Alan's first email which helped me reassess my code,
>>>
>>> I modified my code to remove the year dimension from most variables
>>>
>>> to make them more manageable.  Now the code
>>>
>>> runs in 1 hour, rather than 1 day+.
>>>
>>>
>>> Michael
>>>
>>>
>>> load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_code.ncl"
>>> load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_csm.ncl"
>>> load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"
>>> load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/shea_util.ncl"
>>> begin
>>>
>>> mns=(/"01","02","03","04","05","06","07","08","09","10","11","12"/)
>>> ndays=(/31,28,31,30,31,30,31,31,30,31,30,31/)
>>>
>>> daily365=new((/365,20,141,217/),float) ; 365 day calendar data
>>> daily365 at _FillValue=1e+35
>>> daily365=1e+35
>>>
>>> do iyr=0,19
>>>
>>>   data=new((/141,217,12,744/),float) ; hourly data
>>>   data at _FillValue=1e+35
>>>   data=1e+35
>>>
>>>   cnt=0
>>>   do im=0,11
>>>     prefix=(1980+iyr)+""+mns(im)
>>>
>>> b=addfile("/volumes/data1/yafang/Downscaling/ACCESS1-0/historical/output/ACCESS_SRF."+(1980+iyr)+""+mns(im)+"
>>> 0100.nc","r") ; read hourly SRF data
>>>     iy=b->iy
>>>     jx=b->jx
>>>     xlat=b->xlat
>>>     xlon=b->xlon
>>>     snow=b->snv ; liquid equiv of snow on ground
>>>     dims=dimsizes(snow)
>>>     nt=dims(0)
>>>     data(:,:,im,0:nt-1)=snow(iy|:,jx|:,time|:)
>>>     delete(snow)
>>>     delete(b)
>>>     delete(dims)
>>>     cnt=cnt+1
>>>   end do
>>>   data at _FillValue=1e+20
>>>
>>>   daily=new((/141,217,12,31/),float) ; daily data per month
>>>   daily at _FillValue=1e+35
>>>   daily=1e+35
>>>
>>>   cnt=0
>>>   do id=0,30
>>>     daily(:,:,:,id)=dim_avg(data(:,:,:,cnt:cnt+23)) ; convert hourly to
>>> daily data
>>>     cnt=cnt+24
>>>   end do
>>>
>>>   delete(data)
>>>
>>>
>>>
>>>   cnt=0
>>>   do im=0,11
>>>     do id=0,ndays(im)-1
>>>       daily365(cnt,iyr,:,:)=daily(:,:,im,id) ; convert daily data per
>>> month to 365 day calendar
>>>       cnt=cnt+1
>>>     end do
>>>   end do
>>>
>>>   delete(daily)
>>>
>>> end do
>>>
>>> daily212=new((/19,212,141,217/),float) ; 212 day calendar data for
>>> Sep-Mar
>>> daily212 at _FillValue=1e+35
>>> daily212=1e+35
>>>
>>> do iyr=0,18
>>>   daily212(iyr,0:121,:,:)=daily365(243:364,iyr,:,:) ; retrieve Sep-Mar
>>>   daily212(iyr,122:211,:,:)=daily365(0:89,iyr+1,:,:)
>>> end do
>>>
>>> delete(daily365)
>>>
>>> year=ispan(0,18,1)
>>> year!0="year"
>>> year&year=year
>>>
>>> time=ispan(0,211,1)
>>> time!0="time"
>>> time&time=time
>>>
>>> daily212!0="year"
>>> daily212!1="time"
>>> daily212!2="iy"
>>> daily212!3="jx"
>>> daily212&year=year
>>> daily212&time=time
>>> daily212&iy=iy
>>> daily212&jx=jx
>>>
>>> daily212 at long_name = "liquid snow water on ground"
>>> daily212 at units = "kg m-2"
>>> daily212 at coordinates="xlat xlon"
>>> daily212 at grid_mapping = "rcm_map"
>>>
>>>
>>>
>>> system("rm save_daily212_actual_snv_access_late20_faster.nc")
>>> out=addfile("save_daily212_actual_snv_access_late20_faster.nc","c")
>>> out->daily212=daily212
>>> out->xlat=xlat
>>> out->xlon=xlon
>>> out->iy=iy
>>> out->jx=jx
>>>
>>>
>>>
>>>
>>>
>>> Michael Notaro
>>> Associate Director
>>> Nelson Institute Center for Climatic Research
>>> University of Wisconsin-Madison
>>> Phone: (608) 261-1503
>>> Email: mnotaro at wisc.edu
>>>
>>>
>>> ------------------------------
>>> *From:* Mary Haley <haley at ucar.edu>
>>> *Sent:* Tuesday, January 12, 2016 1:34 PM
>>> *To:* Alan Brammer
>>> *Cc:* Michael Notaro; ncl-talk at ucar.edu
>>>
>>> *Subject:* Re: [ncl-talk] Slow code
>>>
>>> Hi folks,
>>>
>>> These are all good suggestions.
>>>
>>> Another thing that is expensive in NCL is reordering arrays with syntax
>>> like:
>>>
>>> snow(iy|:,jx|:,time|:),
>>>
>>> NCL makes a copy of the array when you do this, and it has to swap all
>>> the dimensions every time in the loop.
>>>
>>> Isf reordering the array is absolutely necessary? I see that you are
>>> reordering and then calling "dim_avg_n". Since you are already using
>>> dim_avg_n, why not leave the array as is and just change the dimension you
>>> do the average on?
>>>
>>> --Mary
>>>
>>>
>>> On Tue, Jan 12, 2016 at 8:30 AM, Alan Brammer <abrammer at albany.edu>
>>> wrote:
>>>
>>>> Hi Michael,
>>>>
>>>>
>>>> I was going to suggest reshape that data array but it’s 20GB and is
>>>> going to be unnecessarily slow whatever.  Do you actually need to store all
>>>> the hourly data? the below edits suggest that you don’t.  The below uses
>>>> less than a 1GB of memory rather than 20+GB.
>>>>
>>>>  This is obviously untested so may need editing.
>>>> (requires 6.1.1 or newer. )
>>>>
>>>>
>>>> mns=(/"01","02","03","04","05","06","07","08","09","10","11","12"/)
>>>> ndays=(/31,28,31,30,31,30,31,31,30,31,30,31/)
>>>>
>>>> daily=new((/141,217,20,12,31/),float) ; hourly data
>>>> daily at _FillValue=1e+35
>>>> daily=1e+35
>>>>
>>>> cnt=0
>>>> do iyr=0,19
>>>>   do im=0,11
>>>>     prefix=(1980+iyr)+""+mns(im)
>>>>
>>>> b=addfile("/volumes/data1/yafang/Downscaling/ACCESS1-0/historical/output/ACCESS_SRF."+(1980+iyr)+""+mns(im)+"
>>>> 0100.nc","r") ; read hourly SRF data
>>>>     iy=b->iy
>>>>     jx=b->jx
>>>>     xlat=b->xlat   ; These aren’t doing anything?
>>>>     xlon=b->xlon ; These aren’t doing anything?
>>>>     snow =b->snv ; liquid equiv of snow on ground
>>>>     dims=dimsizes(snow)
>>>>     nt=dims(0)
>>>>
>>>>     snow4d := reshape( snow(iy|:,jx|:,time|:), (/dims(1), dims(2),
>>>> ndays(im), 24/) ) ; I assume snow is originally (time|:,iy|:,ix|:)
>>>>     daily(:,:,iyr,im,:ndays(im)-1)=dim_avg_n(snow4d, 3)
>>>>
>>>>     delete(snow)
>>>>     delete(b)
>>>>     delete(dims)
>>>>     cnt=cnt+1
>>>>   end do
>>>> end do
>>>>
>>>> daily at _FillValue=1e+20
>>>>
>>>>
>>>> Good luck,
>>>>
>>>>
>>>> Alan Brammer.
>>>>
>>>>
>>>>
>>>>
>>>> On 12 Jan 2016, at 10:00, Michael Notaro <mnotaro at wisc.edu> wrote:
>>>>
>>>> Thanks for your email.
>>>>
>>>> Actually, this is the main part slowing me down, not the top part of
>>>> the code with the addfiles.
>>>>
>>>> cnt=0
>>>> do id=0,30
>>>>   daily(:,:,:,:,id)=dim_avg(data(:,:,:,:,cnt:cnt+23)) ; convert hourly
>>>> to daily data     ***** THIS PART IS SLOW *****
>>>>   cnt=cnt+24
>>>> end do
>>>>
>>>>
>>>> Any way to perform this task quicker?
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Michael Notaro
>>>> Associate Director
>>>> Nelson Institute Center for Climatic Research
>>>> University of Wisconsin-Madison
>>>> Phone: (608) 261-1503
>>>> Email: mnotaro at wisc.edu
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Guido Cioni <guidocioni at gmail.com>
>>>> *Sent:* Tuesday, January 12, 2016 8:57 AM
>>>> *To:* Michael Notaro
>>>> *Cc:* ncl-talk at ucar.edu
>>>> *Subject:* Re: [ncl-talk] Slow code
>>>>
>>>> Everyone here will tell you that using loops in NCL it’s not efficient
>>>> :)
>>>> But from my experience I think that the main thing slowing you down is
>>>> that you are using addfile at every iteration.
>>>> Does creating a whole file and reading that in the beginning would
>>>> change what you are trying to compute?
>>>>
>>>> Guido Cioni
>>>> http://guidocioni.altervista.org
>>>> <http://guidocioni.altervista.org/>
>>>> Guido Cioni <http://guidocioni.altervista.org/>
>>>> guidocioni.altervista.org
>>>> Le stazioni sono state riparate ed i dati vengono nuovamente aggiornati
>>>> in tempo reale. Purtroppo a causa di numerosi malfunzionamenti i dati
>>>> pluviometrici di Pisa e ...
>>>>
>>>>
>>>> On 12 Jan 2016, at 15:35, Michael Notaro <mnotaro at wisc.edu> wrote:
>>>>
>>>> Does anyone have a recommendation to speed up my code?
>>>> It's been running for a day now.  I put asterisks next to the real slow
>>>> loop.
>>>> Basically, that part is converting a large array of hourly data into
>>>> daily data.
>>>> Thanks, Michael
>>>>
>>>>
>>>> mns=(/"01","02","03","04","05","06","07","08","09","10","11","12"/)
>>>> ndays=(/31,28,31,30,31,30,31,31,30,31,30,31/)
>>>>
>>>> data=new((/141,217,20,12,744/),float) ; hourly data
>>>> data at _FillValue=1e+35
>>>> data=1e+35
>>>>
>>>> cnt=0
>>>> do iyr=0,19
>>>>   do im=0,11
>>>>     prefix=(1980+iyr)+""+mns(im)
>>>>
>>>> b=addfile("/volumes/data1/yafang/Downscaling/ACCESS1-0/historical/output/ACCESS_SRF."+(1980+iyr)+""+mns(im)+"
>>>> 0100.nc","r") ; read hourly SRF data
>>>>     iy=b->iy
>>>>     jx=b->jx
>>>>     xlat=b->xlat
>>>>     xlon=b->xlon
>>>>     snow=b->snv ; liquid equiv of snow on ground
>>>>     dims=dimsizes(snow)
>>>>     nt=dims(0)
>>>>     data(:,:,iyr,im,0:nt-1)=snow(iy|:,jx|:,time|:)
>>>>     delete(snow)
>>>>     delete(b)
>>>>     delete(dims)
>>>>     cnt=cnt+1
>>>>   end do
>>>> end do
>>>> data at _FillValue=1e+20
>>>>
>>>> daily=new((/141,217,20,12,31/),float) ; daily data per month
>>>> daily at _FillValue=1e+35
>>>> daily=1e+35
>>>>
>>>> cnt=0
>>>> do id=0,30
>>>>   daily(:,:,:,:,id)=dim_avg(data(:,:,:,:,cnt:cnt+23)) ; convert hourly
>>>> to daily data     ***** THIS PART IS SLOW *****
>>>>   cnt=cnt+24
>>>> end do
>>>>
>>>> delete(data)
>>>>
>>>>
>>>>
>>>>
>>>> Michael Notaro
>>>> Associate Director
>>>> Nelson Institute Center for Climatic Research
>>>> University of Wisconsin-Madison
>>>> Phone: (608) 261-1503
>>>> Email: mnotaro at wisc.edu
>>>> _______________________________________________
>>>> ncl-talk mailing list
>>>> ncl-talk at ucar.edu
>>>> List instructions, subscriber options, unsubscribe:
>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>>
>>>>
>>>> _______________________________________________
>>>> ncl-talk mailing list
>>>> ncl-talk at ucar.edu
>>>> List instructions, subscriber options, unsubscribe:
>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ncl-talk mailing list
>>>> ncl-talk at ucar.edu
>>>> List instructions, subscriber options, unsubscribe:
>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>>
>>>>
>>>
>>> _______________________________________________
>>> ncl-talk mailing list
>>> ncl-talk at ucar.edu
>>> List instructions, subscriber options, unsubscribe:
>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>
>>>
>>
>> _______________________________________________
>> ncl-talk mailing list
>> ncl-talk at ucar.edu
>> List instructions, subscriber options, unsubscribe:
>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20160115/1f553118/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tst.cdo_daymean.ncl
Type: application/octet-stream
Size: 2059 bytes
Desc: not available
Url : http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20160115/1f553118/attachment.obj 


More information about the ncl-talk mailing list