[pyngl-talk] PyNIO 1.5.0 Beta vs 1.4.1 - NetCDF Variable Access and Numpy

Jason Greenlaw - NOAA Affiliate jason.greenlaw at noaa.gov
Mon Jan 4 09:15:28 MST 2016


Thanks for the explanation Dave.

It was definitely still doing something, as it was consuming 100% CPU and
rose to 12GB memory usage (and was still climbing) within a few minutes
before I killed it.  Not sure what would cause it to gobble memory like
that on such a small file though.

I'm satisfied with using the indexing syntax and as you said, it is
preferable in almost all use cases.

Thanks
Jason



On Thu, Dec 31, 2015 at 1:48 PM, David Brown <dbrown at ucar.edu> wrote:

> I am not totally sure what is going on here, but I can tell you that
> for PyNIO, as for its predecessor, Konrad Hinson's scientific NetCDF
> package, the design was that you needed to "dereference" the
> NioVariable object using indexing syntax to get the NumPyarray values.
> But the NioVariable object always supported the Python Sequence
> protocol, and I believe that at some point, support for the Sequence
> protocol in numpy was enhanced in a way that allowed NumPy arrays to
> be derived from NioVariable objects. This was without any explicit
> changes to support this feature in PyNIO.
>
> However, in my experience, trying to use this in practice has
> extremely bad performance, because NumPy has no real knowledge of the
> data that the NioVariable object refers to, and consequently it asks
> for array elements one at a time. For anything but very small datasets
> this is extremely inefficient. The chances are that your example with
> 1.5.0-beta did not actually fail. It was just taking an extremely long
> time.
> But given this, I was not aware of a difference in performance between
> 1.4.1 and 1.5.0-beta. We can investigate, but as I said, this was
> never an intended feature of PyNIO, but arose from later developments
> in numpy.
>  -dave
>
> On Tue, Dec 29, 2015 at 11:54 AM, Jason Greenlaw - NOAA Affiliate
> <jason.greenlaw at noaa.gov> wrote:
> > Hi Heather,
> >
> > Yes, it is an NioVariable object.  Seems that at 1.4.1, NioVariable
> provided
> > some numpy functionality from the object itself rather than requiring
> you to
> > extract the numpy array first.
> >
> > I am not an expert with PyNIO/numpy and this code was written by someone
> > else, but I was under the impression NioVariable provided access to the
> > arrays via an iterator (i.e. lazy loading), which would be preferable in
> > some cases to loading the entire arrays into memory using numpy indexing.
> > But I could be totally off base there.
> >
> > Output is below.
> >
> > Thanks,
> > Jason
> >
> >
> > $ python
> > Python 2.7.2 (default, May  7 2012, 16:54:01)
> > [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> >>>> import Nio
> >>>> import numpy
> >>>> Nio.__version__
> > '1.4.1'
> >>>> numpy.__version__
> > '1.6.1'
> >>>> f = Nio.open_file("glofs.leofs.fields.nowcast.20151229.t12z.nc", "r")
> >>>> m = f.variables["mask"]
> >>>> m
> > <Nio.NioVariable object at 0x2290050>
> >>>> print type(m)
> > <class 'Nio.NioVariable'>
> >>>> m.shape
> > (24, 81)
> >>>> m_contents = m[:,:]
> >>>> print type(m_contents)
> > <type 'numpy.ndarray'>
> >>>> numpy.ma.masked_equal(m, 1.0)
> > masked_array(data =
> >  [[0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  ...,
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]],
> >              mask =
> >  [[False False False ..., False False False]
> >  [False False False ..., False False False]
> >  [False False False ..., False False False]
> >  ...,
> >  [False False False ..., False False False]
> >  [False False False ..., False False False]
> >  [False False False ..., False False False]],
> >        fill_value = 1.0)
> >
> >>>> numpy.ma.masked_equal(m_contents, 1.0)
> > masked_array(data =
> >  [[0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  ...,
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]],
> >              mask =
> >  [[False False False ..., False False False]
> >  [False False False ..., False False False]
> >  [False False False ..., False False False]
> >  ...,
> >  [False False False ..., False False False]
> >  [False False False ..., False False False]
> >  [False False False ..., False False False]],
> >        fill_value = 1.0)
> >
> >
> > --
> > Jason Greenlaw
> > Software Developer, ERT, Inc.
> > NOAA/NOS/OCS/CSDL
> > http://nowcoast.noaa.gov
> > Jason.Greenlaw at noaa.gov
> >
> >
> > On Tue, Dec 29, 2015 at 1:19 PM, Cronk,Heather <
> Heather.Cronk at colostate.edu>
> > wrote:
> >>
> >> Hi Jason,
> >>
> >> I am a PyNIO user, not a developer so I can’t speak to any
> intentionality,
> >> but I am more surprised that your code works with version PyNIO version
> >> 1.4.1 than that it does’t work with the beta. I don’t have the old
> version
> >> anymore, but I am curious the output of type(m) with your original
> code? I
> >> was under the impression that the call f.variables["mask”] had always
> >> produced a Nio object and not a numpy array. Using the beta version I
> see
> >> this:
> >>
> >>
> >> m_obj = f.variables["mask"]
> >>
> >> print type(m_obj)
> >>
> >> >>  <class 'Nio.NioVariable'>
> >>
> >> m_contents = f.variables["mask"][:]
> >>
> >> print type(m_contents)
> >>
> >> >>  <type 'numpy.ndarray’>
> >>
> >>
> >> What does the corresponding code produce with the 1.4.1?
> >>
> >>
> >> Thanks!
> >>
> >> Heather
> >>
> >>
> >> From: <pyngl-talk-bounces at ucar.edu> on behalf of Jason Greenlaw - NOAA
> >> Affiliate <jason.greenlaw at noaa.gov>
> >> Date: Tuesday, December 29, 2015 at 10:36 AM
> >> To: "pyngl-talk at ucar.edu" <pyngl-talk at ucar.edu>
> >> Subject: [pyngl-talk] PyNIO 1.5.0 Beta vs 1.4.1 - NetCDF Variable Access
> >> and Numpy
> >>
> >> Hello,
> >>
> >> I recently installed the 1.5.0 beta versions of PyNGL and PyNIO (using
> >> 64-bit binaries for CentOS6) and attempted to run some existing code,
> but
> >> encountered an issue when numpy functions (e.g.
> numpy.ma.masked_equal()) are
> >> called with NioVariable object arguments.
> >>
> >> At PyNIO v1.4.1/numpy1.6.1 I was able to do the following:
> >>
> >> >>> import Nio
> >> >>> Nio.__version__
> >> '1.4.1'
> >> >>> import numpy
> >> >>> numpy.__version__
> >> '1.6.1'
> >> >>> f = Nio.open_file("glofs.leofs.fields.nowcast.20151229.t12z.nc",
> "r")
> >> >>> m = f.variables["mask"]
> >> >>> numpy.ma.masked_equal(m, 1.0)
> >> masked_array(data =
> >>  [[0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  ...,
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]],
> >>              mask =
> >>  [[False False False ..., False False False]
> >>  [False False False ..., False False False]
> >>  [False False False ..., False False False]
> >>  ...,
> >>  [False False False ..., False False False]
> >>  [False False False ..., False False False]
> >>  [False False False ..., False False False]],
> >>        fill_value = 1.0)
> >>
> >>
> >>
> >> However at PyNIO 1.5.0 beta/numpy 1.9.2, the numpy function call hangs,
> >> and the process begins consuming memory at an exponential rate until the
> >> call is interrupted.
> >>
> >> >>> import Nio
> >> >>> Nio.__version__
> >> '1.5.0-beta'
> >> >>> import numpy
> >> >>> numpy.__version__
> >> '1.9.2'
> >> >>> f = Nio.open_file("glofs.leofs.fields.nowcast.20151229.t12z.nc",
> "r")
> >> >>> m = f.variables["mask"]
> >> >>> numpy.ma.masked_equal(m, 1.0)
> >> ^C^CTraceback (most recent call last):
> >>   File "<stdin>", line 1, in <module>
> >>   File "/opt/pyngl/python/lib/python2.7/site-packages/numpy/ma/core.py",
> >> line 1982, in masked_equal
> >>     output = masked_where(equal(x, value), x, copy=copy)
> >>   File "/opt/pyngl/python/lib/python2.7/site-packages/numpy/ma/core.py",
> >> line 928, in __call__
> >>     (da, db) = (getdata(a, subok=False), getdata(b, subok=False))
> >>   File "/opt/pyngl/python/lib/python2.7/site-packages/numpy/ma/core.py",
> >> line 667, in getdata
> >>     data = np.array(a, copy=False, subok=subok)
> >>   File "/opt/pyngl/python/lib/python2.7/site-packages/PyNIO/Nio.py",
> line
> >> 325, in __getitem__
> >>     ret = get_variable(self.file, self.varname, xsel)
> >>   File
> "/opt/pyngl/python/lib/python2.7/site-packages/PyNIO/coordsel.py",
> >> line 60, in get_variable
> >>     ret = file.file.variables[varname][xsel]
> >> KeyboardInterrupt
> >>
> >>
> >> But if I change the call to use numpy indexing, it works:
> >>
> >> >>> m.shape
> >> (24, 81)
> >> >>> numpy.ma.masked_equal(m[:,:], 1.0)
> >> masked_array(data =
> >>  [[0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  ...,
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]
> >>  [0.0 0.0 0.0 ..., 0.0 0.0 0.0]],
> >>              mask =
> >>  [[False False False ..., False False False]
> >>  [False False False ..., False False False]
> >>  [False False False ..., False False False]
> >>  ...,
> >>  [False False False ..., False False False]
> >>  [False False False ..., False False False]
> >>  [False False False ..., False False False]],
> >>        fill_value = 1.0)
> >>
> >>
> >> Was this change in functionality intentional?
> >>
> >> The NetCDF files I used are available at:
> >>
> >>
> ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nos/prod/glofs.20151229/
> >>
> >> (directory will change based on date)
> >>
> >> Thanks,
> >> Jason
> >> --
> >> Jason Greenlaw
> >> Software Developer, ERT, Inc.
> >> NOAA/NOS/OCS/CSDL
> >> http://nowcoast.noaa.gov
> >> Jason.Greenlaw at noaa.gov
> >>
> >
> >
> > _______________________________________________
> > pyngl-talk mailing list
> > List instructions, subscriber options, unsubscribe:
> > http://mailman.ucar.edu/mailman/listinfo/pyngl-talk
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/pyngl-talk/attachments/20160104/e40abaca/attachment.html 


More information about the pyngl-talk mailing list