[ncl-talk] Reading and computation efficiency

Mon Jun 6 08:57:27 MDT 2016

Under-the-hood, NCL is calling the standard C-interfaces to the netCDF
library.

Your file is 48GB. Each variable is less but what size?

Are you in a multi-user environment?

As far as I know, operating systems do mange memory. If a user asks for
2gb, the system may grant only (say) 200MB; then it round robins to other
users granting memory before coming back to you for another 200MB, etc.
Lots of users can cause slownwss.

The downside is the memory is not contiguous. Rather, it is fragmented.
This is not an NCL issue.

The 'relhum' function NCL uses is from a fortran based model
post-processor. NCL was requested to use this for backward compatibility.
It is a function that uses a table look-up method. While this function is
fortran, i does one computation at a time. The actual looping is done by  a
C-driver. Certainly, this language mix inhibits optimization. Still it
should be pretty quick.

  data = addfile("./complete_remap.nc", "r")

; read all data

  readStrt_p = get_cpu_time()
  p    = data->pres    ; pressure     [Pa]
  print("readStrt_: " + (*get_cpu_time*() - readStrt_p))

  t    = data->temp    ; temperature  [K]
  qv   = data->qv    ; qv [ kg/kg]
  z    = data->z_mc    ; geopotential [m]

  print("read All Variables: " + (*get_cpu_time*() - readStrt_p))

    print("===")
    printVarSummary(p)

; computations ... all

  rhStrt = get_cpu_time()
  rh=relhum(t, qv, p)
  print("rh: " + (*get_cpu_time*() - rhStrt))

    tdStrt = get_cpu_time()
  td=dewtemp_trh(t, rh)
  print("td: " + (*get_cpu_time*() - tdStrt))

  print("COMPUTATION "+(get_cpu_time()-rhStrt)+"s”)
  print("======================================================")

You could also use a much smaller array size just to see what happens

  readStrt_p = get_cpu_time()
  p    = data->pres(0:3,:,:)    ; pressure     [Pa]
  print("readStrt_: " + (*get_cpu_time*() - readStrt_p))

  t    = data->temp(0:3,:,:)    ; temperature  [K]
  qv   = data->qv(0:3,:,:)      ; qv [ kg/kg]
  z    = data->z_mc(0:3,:,:)    ; geopotential [m]

    printVarSummary(p)

On Mon, Jun 6, 2016 at 2:40 AM, Guido Cioni <guidocioni at gmail.com> wrote:

> Just forgot to add that the time dimension has “only” 73 steps.
>
> Guido Cioni
> http://guidocioni.altervista.org
>
> On 06 Jun 2016, at 10:38, Guido Cioni <guidocioni at gmail.com> wrote:
>
> Hi,
> the question is very simple, and I believe to already have the answer but
> still, is worth trying.
> When managing large files in NCL I always have to create new tricks in the
> debugging phase in order to avoid long waiting times. Today I was trying to
> read a dataset with 401x401x150 points (approx 48 GB):
>
>   data = addfile("./complete_remap.nc", "r")
>
>   p    = data->pres    ; pressure     [Pa]
>   t    = data->temp    ; temperature  [K]
>   qv   = data->qv    ; qv [ kg/kg]
>   z    = data->z_mc    ; geopotential [m]
>
>   print("FILEs READ in "+get_cpu_time()+"s")
>
>   rh=relhum(t, qv, p)
>   td=dewtemp_trh(t, rh)
>
>   print("COMPUTATION "+get_cpu_time()+"s”)
>
> and getting the following printout.
>
> (0) FILEs READ in 47.4748s
> (0) COMPUTATION 499.424s
>
> Is there any way to speed up the process? I tried to use as few definition
> as possible and only pre-included functions.
> Why is the computation part taking so long? Maybe it’s something that
> depends on the system RAM?
> In the meantime the best workaround that I could think of consisted in
> subsetting a region in the previous data and testing the code only on that
> file.
>
> Cheers
>
> Guido Cioni
> http://guidocioni.altervista.org
>
>
>
> _______________________________________________
> ncl-talk mailing list
> ncl-talk at ucar.edu
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20160606/baebb283/attachment.html