[ncl-talk] Slow code

Michael Notaro mnotaro at wisc.edu
Fri Jan 15 20:31:06 MST 2016


Thank you, Dennis, for all the time you put into this

and your helpful advice.  And good catch about the time steps.

Michael




Michael Notaro
Associate Director
Nelson Institute Center for Climatic Research
University of Wisconsin-Madison
Phone: (608) 261-1503
Email: mnotaro at wisc.edu


________________________________
From: Dennis Shea <shea at ucar.edu>
Sent: Friday, January 15, 2016 4:42 PM
To: ncl-talk at ucar.edu
Cc: Michael Notaro; Appo derbetini
Subject: Re: [ncl-talk] Slow code

This is a long and rather detailed post which I have worked upon on-and-off over the past day. The basic message is:  Don't use tools (operators) without looking at your data.
-----
Appolinaire's suggestion follows the general approach I mentioned in a previous ncl-talk response on this thread:

  [1] Process each year-month file containing hourly values:
        ACCESS_SRF.1980010100.nc<http://ACCESS_SRF.1980010100.nc>; ACCESS_SRF.1980020100.nc<http://ACCESS_SRF.1980020100.nc>, .....
  [2] Write a netCDF file containing daily means( with the time dimension written as 'unlimited') for each year-month file
  [3] After [2], concatenate the daily mean files ('cdo mergetime' or  the NCO 'ncrcat' operator)

I wrote a sample NCL driver script to do this sequence (see attached:  tst.cdo_daymean.ncl). Prior to running the NCL driver script, I looked at the  number of time steps on the two sample year-month files that Mike sent.

The ACCESS_SRF.1980010100.nc<http://ACCESS_SRF.1980010100.nc> file has 744 time (hourly) values:
       744/24= 31
which is expected for January.

The ACCESS_SRF.1980020100.nc<http://ACCESS_SRF.1980010100.nc> file has 696 time (hourly) values:
       696/24= 29
which is expected for February in a leap year.

The total number of daily means should be 60 (=31+29). However, when I ran the tst.cdo_daymean.ncl script  on my two sample files I got:

========================
%> ncl tst.cdo_daymean.ncl

 Copyright (C) 1995-2015 - All Rights Reserved
 University Corporation for Atmospheric Research
 NCAR Command Language Version 6.3.0
 The use of this software is governed by a License Agreement.
 See http://www.ncl.ucar.edu/ for more details.


Variable: fili
Type: string
Total Size: 16 bytes
            2 values
Number of Dimensions: 1
Dimensions and sizes:    [2]
Coordinates:
(0)    ACCESS_SRF.1980010100.nc<http://ACCESS_SRF.1980010100.nc>
(1)    ACCESS_SRF.1980020100.nc<http://ACCESS_SRF.1980020100.nc>
(0)    ---------
(0)    cdo daymean ./ACCESS_SRF.1980010100.nc<http://ACCESS_SRF.1980010100.nc> ./daily/ACCESS_SRF.19800101.nc<http://ACCESS_SRF.19800101.nc>
Warning (cdfSetVar) : Inconsistent variable definition for xlat!
Warning (cdfSetVar) : Inconsistent variable definition for xlon!
cdo daymean: Processed 614693730 values from 27 variables over 744 timesteps ( 8.13s )

(0)    cdo daymean ./ACCESS_SRF.1980020100.nc<http://ACCESS_SRF.1980020100.nc> ./daily/ACCESS_SRF.19800201.nc<http://ACCESS_SRF.19800201.nc>
Warning (cdfSetVar) : Inconsistent variable definition for xlat!
Warning (cdfSetVar) : Inconsistent variable definition for xlon!
cdo daymean: Processed 575040018 values from 27 variables over 696 timesteps ( 7.57s )

(0)    =========
(0)    PATHO=./daily/ACCESS_SRF.19800101-19800201.nc<http://ACCESS_SRF.19800101-19800201.nc>
(0)    cdo_mergetim: cdo mergetime ./daily/ACCESS_SRF*nc ./daily/ACCESS_SRF.19800101-19800201.nc<http://ACCESS_SRF.19800101-19800201.nc>
cdo mergetime: Processed 51341766 values from 54 variables over 62 timesteps ( 3.19s )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
======
Note: **  62 ** time steps when 60 are expected.
(FYI: NCL's 6.3.1 'calculate_daily_values' function returns 62 also!).

Are both CDO *and* NCL 'wrong' ? The short answer is "no" !   Pourquoi??  Why??

Unfortunately, the way the source files were created presents a problem for this generic one month at a time approach.  :-(
It assumes that each day on the file has 24 hourly values for each day.
=================
As is frequently mentioned on ncl-talk, the golden rule of data processing is *look at your data*

   diri = "./"
   fili = "ACCESS_SRF.1980010100.nc<http://ACCESS_SRF.1980010100.nc>"   ; hourly
   f    = addfile(diri+fili, "r")
   time = f->time
   date = cd_calendar(time, -3)        ; yyyymmddhh
   print(date)

===
Variable: date
Type: integer
Total Size: 2976 bytes
            744 values
Number of Dimensions: 1
Dimensions and sizes:   [744]
Coordinates:
Number Of Attributes: 1
  calendar :    gregorian

(0)     1980010101              <=== starts at hour 1 (hh=1)   ... not hour 00 as one might expect
(1)     1980010102
(2)     1980010103
[snip]
(21)    1980010122
(22)    1980010123            <=== day 01 ends here ... *** 23 hours *** for day 1
(23)    1980010200            <=== day 02 starts at 00
(24)    1980010201
[snip]
(45)    1980010222
(46)    1980010223            <=== day 02 end at 23  *** 24 hours *** for day 2
(47)    1980010300            <=== day 02 starts at 00
(48)    1980010301
[SNIP]
     days 2-31 have 24 value
[SNIP]
(719)   1980013100
(720)   1980013101
(721)   1980013102
[snip]
(741)   1980013122
(742)   1980013123
(Jan & Feb)
(743)   1980020100           *** This is where the 32nd day come from!!   1 hour at February 1 at 00Z
^^^^^^^^^^^^^^^^^^^

So, 32 daily means with the  daily mean for Jan 1, calculated from 23 values; The last daily mean is Feb 1 at 00Z with one value. All other values would have 24 hourly values used.

===========
The resultant merged file will have ***two***  Feb 1 means

[snip]
(29)    1980013023
(30)    1980013123
(31)    1980020100 <========  one value at 00Z from the 1st file
(32)    1980020123 <======== derived from 23 hourly values on the 2nd file
[snip]

(741)   1980013122
(742)   1980013123
(743)   1980020100     <========  one value

Punch line: This will likely create issues with subsequent post-processing.




On Tue, Jan 12, 2016 at 11:44 PM, Appo derbetini <appopson4 at gmail.com<mailto:appopson4 at gmail.com>> wrote:
Hi,
To convert hourly to daily data, I'm using CDO

For january 1980 for example,

cdo daymean  hourly_01.1980.nc<http://hourly_01.1980.nc>     daily_01.1980.nc<http://daily_01.1980.nc>

Then I merge monthly files to have yearly data


with


cdo mergetime  daily_*.1980.nc<http://1980.nc>    daily.1980.nc<http://daily.1980.nc>



Merci

Appolinaire

2016-01-12 21:47 GMT+01:00 Dennis Shea <shea at ucar.edu<mailto:shea at ucar.edu>>:
At my request, Mike sent me two sample files. I will look when I get a chance.

'My' strategy when dealing with multiple files is

[1]
Will use of 'addfiles' be advantageous for the problem.? if hourly arrays are 20GB ... the answer is likely 'no'. Memory allocation, especially on multi user systems) can be slow.

[2]
Most commonly

  [a] I will process one year at a time, converting year, month, day, hr to (say) 'hours since ...'.
  [b] Compute the daily averages; write each year or month to netCDF with 'time' unlimited
  [c] use the 'ncks' or 'ncrcat' operators to combine the files if that is desirable.

---
I am attaching the 'contrbuted.ncl' functions that are used to calculate monthly or daily quantities: avg, sum, min, max



http://www.ncl.ucar.edu/Document/Functions/Contributed/calculate_daily_values.shtml
http://www.ncl.ucar.edu/Document/Functions/Contributed/calculate_monthly_values.shtml

D

On Tue, Jan 12, 2016 at 12:48 PM, Michael Notaro <mnotaro at wisc.edu<mailto:mnotaro at wisc.edu>> wrote:

Thanks, Mary, for your further suggestions.


After I got Alan's first email which helped me reassess my code,

I modified my code to remove the year dimension from most variables

to make them more manageable.  Now the code

runs in 1 hour, rather than 1 day+.


Michael


load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_code.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_csm.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/shea_util.ncl"
begin

mns=(/"01","02","03","04","05","06","07","08","09","10","11","12"/)
ndays=(/31,28,31,30,31,30,31,31,30,31,30,31/)

daily365=new((/365,20,141,217/),float) ; 365 day calendar data
daily365 at _FillValue=1e+35
daily365=1e+35

do iyr=0,19

  data=new((/141,217,12,744/),float) ; hourly data
  data at _FillValue=1e+35
  data=1e+35

  cnt=0
  do im=0,11
    prefix=(1980+iyr)+""+mns(im)
    b=addfile("/volumes/data1/yafang/Downscaling/ACCESS1-0/historical/output/ACCESS_SRF."+(1980+iyr)+""+mns(im)+"0100.nc<http://0100.nc>","r") ; read hourly SRF data
    iy=b->iy
    jx=b->jx
    xlat=b->xlat
    xlon=b->xlon
    snow=b->snv ; liquid equiv of snow on ground
    dims=dimsizes(snow)
    nt=dims(0)
    data(:,:,im,0:nt-1)=snow(iy|:,jx|:,time|:)
    delete(snow)
    delete(b)
    delete(dims)
    cnt=cnt+1
  end do
  data at _FillValue=1e+20

  daily=new((/141,217,12,31/),float) ; daily data per month
  daily at _FillValue=1e+35
  daily=1e+35

  cnt=0
  do id=0,30
    daily(:,:,:,id)=dim_avg(data(:,:,:,cnt:cnt+23)) ; convert hourly to daily data
    cnt=cnt+24
  end do

  delete(data)



  cnt=0
  do im=0,11
    do id=0,ndays(im)-1
      daily365(cnt,iyr,:,:)=daily(:,:,im,id) ; convert daily data per month to 365 day calendar
      cnt=cnt+1
    end do
  end do

  delete(daily)

end do

daily212=new((/19,212,141,217/),float) ; 212 day calendar data for Sep-Mar
daily212 at _FillValue=1e+35
daily212=1e+35

do iyr=0,18
  daily212(iyr,0:121,:,:)=daily365(243:364,iyr,:,:) ; retrieve Sep-Mar
  daily212(iyr,122:211,:,:)=daily365(0:89,iyr+1,:,:)
end do

delete(daily365)

year=ispan(0,18,1)
year!0="year"
year&year=year

time=ispan(0,211,1)
time!0="time"
time&time=time

daily212!0="year"
daily212!1="time"
daily212!2="iy"
daily212!3="jx"
daily212&year=year
daily212&time=time
daily212&iy=iy
daily212&jx=jx

daily212 at long_name = "liquid snow water on ground"
daily212 at units = "kg m-2"
daily212 at coordinates="xlat xlon"
daily212 at grid_mapping = "rcm_map"



system("rm save_daily212_actual_snv_access_late20_faster.nc<http://save_daily212_actual_snv_access_late20_faster.nc>")
out=addfile("save_daily212_actual_snv_access_late20_faster.nc<http://save_daily212_actual_snv_access_late20_faster.nc>","c")
out->daily212=daily212
out->xlat=xlat
out->xlon=xlon
out->iy=iy
out->jx=jx






Michael Notaro
Associate Director
Nelson Institute Center for Climatic Research
University of Wisconsin-Madison
Phone: (608) 261-1503<tel:%28608%29%20261-1503>
Email: mnotaro at wisc.edu<mailto:mnotaro at wisc.edu>


________________________________
From: Mary Haley <haley at ucar.edu<mailto:haley at ucar.edu>>
Sent: Tuesday, January 12, 2016 1:34 PM
To: Alan Brammer
Cc: Michael Notaro; ncl-talk at ucar.edu<mailto:ncl-talk at ucar.edu>

Subject: Re: [ncl-talk] Slow code

Hi folks,

These are all good suggestions.

Another thing that is expensive in NCL is reordering arrays with syntax like:

snow(iy|:,jx|:,time|:),

NCL makes a copy of the array when you do this, and it has to swap all the dimensions every time in the loop.

Isf reordering the array is absolutely necessary? I see that you are reordering and then calling "dim_avg_n". Since you are already using dim_avg_n, why not leave the array as is and just change the dimension you do the average on?

--Mary


On Tue, Jan 12, 2016 at 8:30 AM, Alan Brammer <abrammer at albany.edu<mailto:abrammer at albany.edu>> wrote:
Hi Michael,


I was going to suggest reshape that data array but it’s 20GB and is going to be unnecessarily slow whatever.  Do you actually need to store all the hourly data? the below edits suggest that you don’t.  The below uses less than a 1GB of memory rather than 20+GB.

 This is obviously untested so may need editing.
(requires 6.1.1 or newer. )


mns=(/"01","02","03","04","05","06","07","08","09","10","11","12"/)
ndays=(/31,28,31,30,31,30,31,31,30,31,30,31/)

daily=new((/141,217,20,12,31/),float) ; hourly data
daily at _FillValue=1e+35
daily=1e+35

cnt=0
do iyr=0,19
  do im=0,11
    prefix=(1980+iyr)+""+mns(im)
    b=addfile("/volumes/data1/yafang/Downscaling/ACCESS1-0/historical/output/ACCESS_SRF."+(1980+iyr)+""+mns(im)+"0100.nc<http://0100.nc>","r") ; read hourly SRF data
    iy=b->iy
    jx=b->jx
    xlat=b->xlat   ; These aren’t doing anything?
    xlon=b->xlon ; These aren’t doing anything?
    snow =b->snv ; liquid equiv of snow on ground
    dims=dimsizes(snow)
    nt=dims(0)

    snow4d := reshape( snow(iy|:,jx|:,time|:), (/dims(1), dims(2), ndays(im), 24/) ) ; I assume snow is originally (time|:,iy|:,ix|:)
    daily(:,:,iyr,im,:ndays(im)-1)=dim_avg_n(snow4d, 3)

    delete(snow)
    delete(b)
    delete(dims)
    cnt=cnt+1
  end do
end do

daily at _FillValue=1e+20


Good luck,


Alan Brammer.




On 12 Jan 2016, at 10:00, Michael Notaro <mnotaro at wisc.edu<mailto:mnotaro at wisc.edu>> wrote:

Thanks for your email.

Actually, this is the main part slowing me down, not the top part of the code with the addfiles.


cnt=0
do id=0,30
  daily(:,:,:,:,id)=dim_avg(data(:,:,:,:,cnt:cnt+23)) ; convert hourly to daily data     ***** THIS PART IS SLOW *****
  cnt=cnt+24
end do

Any way to perform this task quicker?

Michael





Michael Notaro
Associate Director
Nelson Institute Center for Climatic Research
University of Wisconsin-Madison
Phone: (608) 261-1503<tel:%28608%29%20261-1503>
Email: mnotaro at wisc.edu<mailto:mnotaro at wisc.edu>


________________________________
From: Guido Cioni <guidocioni at gmail.com<mailto:guidocioni at gmail.com>>
Sent: Tuesday, January 12, 2016 8:57 AM
To: Michael Notaro
Cc: ncl-talk at ucar.edu<mailto:ncl-talk at ucar.edu>
Subject: Re: [ncl-talk] Slow code

Everyone here will tell you that using loops in NCL it’s not efficient :)
But from my experience I think that the main thing slowing you down is that you are using addfile at every iteration.
Does creating a whole file and reading that in the beginning would change what you are trying to compute?

Guido Cioni
http://guidocioni.altervista.org<http://guidocioni.altervista.org/>
[http://guidocioni.altervista.org/nuovosito/wp-content/uploads/2015/10/Screenshot.png]<http://guidocioni.altervista.org/>

Guido Cioni<http://guidocioni.altervista.org/>
guidocioni.altervista.org<http://guidocioni.altervista.org/>
Le stazioni sono state riparate ed i dati vengono nuovamente aggiornati in tempo reale. Purtroppo a causa di numerosi malfunzionamenti i dati pluviometrici di Pisa e ...



On 12 Jan 2016, at 15:35, Michael Notaro <mnotaro at wisc.edu<mailto:mnotaro at wisc.edu>> wrote:

Does anyone have a recommendation to speed up my code?
It's been running for a day now.  I put asterisks next to the real slow loop.
Basically, that part is converting a large array of hourly data into daily data.
Thanks, Michael



mns=(/"01","02","03","04","05","06","07","08","09","10","11","12"/)
ndays=(/31,28,31,30,31,30,31,31,30,31,30,31/)

data=new((/141,217,20,12,744/),float) ; hourly data
data at _FillValue=1e+35
data=1e+35

cnt=0
do iyr=0,19
  do im=0,11
    prefix=(1980+iyr)+""+mns(im)
    b=addfile("/volumes/data1/yafang/Downscaling/ACCESS1-0/historical/output/ACCESS_SRF."+(1980+iyr)+""+mns(im)+"0100.nc<http://0100.nc>","r") ; read hourly SRF data
    iy=b->iy
    jx=b->jx
    xlat=b->xlat
    xlon=b->xlon
    snow=b->snv ; liquid equiv of snow on ground
    dims=dimsizes(snow)
    nt=dims(0)
    data(:,:,iyr,im,0:nt-1)=snow(iy|:,jx|:,time|:)
    delete(snow)
    delete(b)
    delete(dims)
    cnt=cnt+1
  end do
end do
data at _FillValue=1e+20

daily=new((/141,217,20,12,31/),float) ; daily data per month
daily at _FillValue=1e+35
daily=1e+35

cnt=0
do id=0,30
  daily(:,:,:,:,id)=dim_avg(data(:,:,:,:,cnt:cnt+23)) ; convert hourly to daily data     ***** THIS PART IS SLOW *****
  cnt=cnt+24
end do

delete(data)




Michael Notaro
Associate Director
Nelson Institute Center for Climatic Research
University of Wisconsin-Madison
Phone: (608) 261-1503<tel:%28608%29%20261-1503>
Email: mnotaro at wisc.edu<mailto:mnotaro at wisc.edu>
_______________________________________________
ncl-talk mailing list
ncl-talk at ucar.edu<mailto:ncl-talk at ucar.edu>
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk

_______________________________________________
ncl-talk mailing list
ncl-talk at ucar.edu<mailto:ncl-talk at ucar.edu>
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk


_______________________________________________
ncl-talk mailing list
ncl-talk at ucar.edu<mailto:ncl-talk at ucar.edu>
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk



_______________________________________________
ncl-talk mailing list
ncl-talk at ucar.edu<mailto:ncl-talk at ucar.edu>
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk



_______________________________________________
ncl-talk mailing list
ncl-talk at ucar.edu<mailto:ncl-talk at ucar.edu>
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20160116/96e359f1/attachment-0001.html 


More information about the ncl-talk mailing list