<div dir="ltr">Thanks for the explanation Dave.<div><br></div><div>It was definitely still doing something, as it was consuming 100% CPU and rose to 12GB memory usage (and was still climbing) within a few minutes before I killed it. Not sure what would cause it to gobble memory like that on such a small file though.</div><div><br></div><div>I'm satisfied with using the indexing syntax and as you said, it is preferable in almost all use cases.</div><div><br></div><div>Thanks</div><div>Jason</div><div><br></div><div><div class="gmail_extra"><div><div class="gmail_signature"><div dir="ltr"><div><div><div><br></div></div></div></div></div></div>
<br><div class="gmail_quote">On Thu, Dec 31, 2015 at 1:48 PM, David Brown <span dir="ltr"><<a href="mailto:dbrown@ucar.edu" target="_blank">dbrown@ucar.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I am not totally sure what is going on here, but I can tell you that<br>
for PyNIO, as for its predecessor, Konrad Hinson's scientific NetCDF<br>
package, the design was that you needed to "dereference" the<br>
NioVariable object using indexing syntax to get the NumPyarray values.<br>
But the NioVariable object always supported the Python Sequence<br>
protocol, and I believe that at some point, support for the Sequence<br>
protocol in numpy was enhanced in a way that allowed NumPy arrays to<br>
be derived from NioVariable objects. This was without any explicit<br>
changes to support this feature in PyNIO.<br>
<br>
However, in my experience, trying to use this in practice has<br>
extremely bad performance, because NumPy has no real knowledge of the<br>
data that the NioVariable object refers to, and consequently it asks<br>
for array elements one at a time. For anything but very small datasets<br>
this is extremely inefficient. The chances are that your example with<br>
1.5.0-beta did not actually fail. It was just taking an extremely long<br>
time.<br>
But given this, I was not aware of a difference in performance between<br>
1.4.1 and 1.5.0-beta. We can investigate, but as I said, this was<br>
never an intended feature of PyNIO, but arose from later developments<br>
in numpy.<br>
-dave<br>
<div><div class="h5"><br>
On Tue, Dec 29, 2015 at 11:54 AM, Jason Greenlaw - NOAA Affiliate<br>
<<a href="mailto:jason.greenlaw@noaa.gov">jason.greenlaw@noaa.gov</a>> wrote:<br>
> Hi Heather,<br>
><br>
> Yes, it is an NioVariable object. Seems that at 1.4.1, NioVariable provided<br>
> some numpy functionality from the object itself rather than requiring you to<br>
> extract the numpy array first.<br>
><br>
> I am not an expert with PyNIO/numpy and this code was written by someone<br>
> else, but I was under the impression NioVariable provided access to the<br>
> arrays via an iterator (i.e. lazy loading), which would be preferable in<br>
> some cases to loading the entire arrays into memory using numpy indexing.<br>
> But I could be totally off base there.<br>
><br>
> Output is below.<br>
><br>
> Thanks,<br>
> Jason<br>
><br>
><br>
> $ python<br>
> Python 2.7.2 (default, May 7 2012, 16:54:01)<br>
> [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2<br>
> Type "help", "copyright", "credits" or "license" for more information.<br>
>>>> import Nio<br>
>>>> import numpy<br>
>>>> Nio.__version__<br>
> '1.4.1'<br>
>>>> numpy.__version__<br>
> '1.6.1'<br>
>>>> f = Nio.open_file("<a href="http://glofs.leofs.fields.nowcast.20151229.t12z.nc" rel="noreferrer" target="_blank">glofs.leofs.fields.nowcast.20151229.t12z.nc</a>", "r")<br>
>>>> m = f.variables["mask"]<br>
>>>> m<br>
> <Nio.NioVariable object at 0x2290050><br>
>>>> print type(m)<br>
> <class 'Nio.NioVariable'><br>
>>>> m.shape<br>
> (24, 81)<br>
>>>> m_contents = m[:,:]<br>
>>>> print type(m_contents)<br>
> <type 'numpy.ndarray'><br>
>>>> numpy.ma.masked_equal(m, 1.0)<br>
> masked_array(data =<br>
> [[0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> ...,<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]],<br>
> mask =<br>
> [[False False False ..., False False False]<br>
> [False False False ..., False False False]<br>
> [False False False ..., False False False]<br>
> ...,<br>
> [False False False ..., False False False]<br>
> [False False False ..., False False False]<br>
> [False False False ..., False False False]],<br>
> fill_value = 1.0)<br>
><br>
>>>> numpy.ma.masked_equal(m_contents, 1.0)<br>
> masked_array(data =<br>
> [[0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> ...,<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]],<br>
> mask =<br>
> [[False False False ..., False False False]<br>
> [False False False ..., False False False]<br>
> [False False False ..., False False False]<br>
> ...,<br>
> [False False False ..., False False False]<br>
> [False False False ..., False False False]<br>
> [False False False ..., False False False]],<br>
> fill_value = 1.0)<br>
><br>
><br>
> --<br>
> Jason Greenlaw<br>
> Software Developer, ERT, Inc.<br>
> NOAA/NOS/OCS/CSDL<br>
> <a href="http://nowcoast.noaa.gov" rel="noreferrer" target="_blank">http://nowcoast.noaa.gov</a><br>
> <a href="mailto:Jason.Greenlaw@noaa.gov">Jason.Greenlaw@noaa.gov</a><br>
><br>
><br>
> On Tue, Dec 29, 2015 at 1:19 PM, Cronk,Heather <<a href="mailto:Heather.Cronk@colostate.edu">Heather.Cronk@colostate.edu</a>><br>
> wrote:<br>
>><br>
>> Hi Jason,<br>
>><br>
>> I am a PyNIO user, not a developer so I can’t speak to any intentionality,<br>
>> but I am more surprised that your code works with version PyNIO version<br>
>> 1.4.1 than that it does’t work with the beta. I don’t have the old version<br>
>> anymore, but I am curious the output of type(m) with your original code? I<br>
>> was under the impression that the call f.variables["mask”] had always<br>
>> produced a Nio object and not a numpy array. Using the beta version I see<br>
>> this:<br>
>><br>
>><br>
>> m_obj = f.variables["mask"]<br>
>><br>
>> print type(m_obj)<br>
>><br>
>> >> <class 'Nio.NioVariable'><br>
>><br>
>> m_contents = f.variables["mask"][:]<br>
>><br>
>> print type(m_contents)<br>
>><br>
>> >> <type 'numpy.ndarray’><br>
>><br>
>><br>
>> What does the corresponding code produce with the 1.4.1?<br>
>><br>
>><br>
>> Thanks!<br>
>><br>
>> Heather<br>
>><br>
>><br>
>> From: <<a href="mailto:pyngl-talk-bounces@ucar.edu">pyngl-talk-bounces@ucar.edu</a>> on behalf of Jason Greenlaw - NOAA<br>
>> Affiliate <<a href="mailto:jason.greenlaw@noaa.gov">jason.greenlaw@noaa.gov</a>><br>
>> Date: Tuesday, December 29, 2015 at 10:36 AM<br>
>> To: "<a href="mailto:pyngl-talk@ucar.edu">pyngl-talk@ucar.edu</a>" <<a href="mailto:pyngl-talk@ucar.edu">pyngl-talk@ucar.edu</a>><br>
>> Subject: [pyngl-talk] PyNIO 1.5.0 Beta vs 1.4.1 - NetCDF Variable Access<br>
>> and Numpy<br>
>><br>
>> Hello,<br>
>><br>
>> I recently installed the 1.5.0 beta versions of PyNGL and PyNIO (using<br>
>> 64-bit binaries for CentOS6) and attempted to run some existing code, but<br>
>> encountered an issue when numpy functions (e.g. numpy.ma.masked_equal()) are<br>
>> called with NioVariable object arguments.<br>
>><br>
>> At PyNIO v1.4.1/numpy1.6.1 I was able to do the following:<br>
>><br>
>> >>> import Nio<br>
>> >>> Nio.__version__<br>
>> '1.4.1'<br>
>> >>> import numpy<br>
>> >>> numpy.__version__<br>
>> '1.6.1'<br>
>> >>> f = Nio.open_file("<a href="http://glofs.leofs.fields.nowcast.20151229.t12z.nc" rel="noreferrer" target="_blank">glofs.leofs.fields.nowcast.20151229.t12z.nc</a>", "r")<br>
>> >>> m = f.variables["mask"]<br>
>> >>> numpy.ma.masked_equal(m, 1.0)<br>
>> masked_array(data =<br>
>> [[0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> ...,<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]],<br>
>> mask =<br>
>> [[False False False ..., False False False]<br>
>> [False False False ..., False False False]<br>
>> [False False False ..., False False False]<br>
>> ...,<br>
>> [False False False ..., False False False]<br>
>> [False False False ..., False False False]<br>
>> [False False False ..., False False False]],<br>
>> fill_value = 1.0)<br>
>><br>
>><br>
>><br>
>> However at PyNIO 1.5.0 beta/numpy 1.9.2, the numpy function call hangs,<br>
>> and the process begins consuming memory at an exponential rate until the<br>
>> call is interrupted.<br>
>><br>
>> >>> import Nio<br>
>> >>> Nio.__version__<br>
>> '1.5.0-beta'<br>
>> >>> import numpy<br>
>> >>> numpy.__version__<br>
>> '1.9.2'<br>
>> >>> f = Nio.open_file("<a href="http://glofs.leofs.fields.nowcast.20151229.t12z.nc" rel="noreferrer" target="_blank">glofs.leofs.fields.nowcast.20151229.t12z.nc</a>", "r")<br>
>> >>> m = f.variables["mask"]<br>
>> >>> numpy.ma.masked_equal(m, 1.0)<br>
>> ^C^CTraceback (most recent call last):<br>
>> File "<stdin>", line 1, in <module><br>
>> File "/opt/pyngl/python/lib/python2.7/site-packages/numpy/ma/core.py",<br>
>> line 1982, in masked_equal<br>
>> output = masked_where(equal(x, value), x, copy=copy)<br>
>> File "/opt/pyngl/python/lib/python2.7/site-packages/numpy/ma/core.py",<br>
>> line 928, in __call__<br>
>> (da, db) = (getdata(a, subok=False), getdata(b, subok=False))<br>
>> File "/opt/pyngl/python/lib/python2.7/site-packages/numpy/ma/core.py",<br>
>> line 667, in getdata<br>
>> data = np.array(a, copy=False, subok=subok)<br>
>> File "/opt/pyngl/python/lib/python2.7/site-packages/PyNIO/Nio.py", line<br>
>> 325, in __getitem__<br>
>> ret = get_variable(self.file, self.varname, xsel)<br>
>> File "/opt/pyngl/python/lib/python2.7/site-packages/PyNIO/coordsel.py",<br>
>> line 60, in get_variable<br>
>> ret = file.file.variables[varname][xsel]<br>
>> KeyboardInterrupt<br>
>><br>
>><br>
>> But if I change the call to use numpy indexing, it works:<br>
>><br>
>> >>> m.shape<br>
>> (24, 81)<br>
>> >>> numpy.ma.masked_equal(m[:,:], 1.0)<br>
>> masked_array(data =<br>
>> [[0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> ...,<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]<br>
>> [0.0 0.0 0.0 ..., 0.0 0.0 0.0]],<br>
>> mask =<br>
>> [[False False False ..., False False False]<br>
>> [False False False ..., False False False]<br>
>> [False False False ..., False False False]<br>
>> ...,<br>
>> [False False False ..., False False False]<br>
>> [False False False ..., False False False]<br>
>> [False False False ..., False False False]],<br>
>> fill_value = 1.0)<br>
>><br>
>><br>
>> Was this change in functionality intentional?<br>
>><br>
>> The NetCDF files I used are available at:<br>
>><br>
>> <a href="ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nos/prod/glofs.20151229/" rel="noreferrer" target="_blank">ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nos/prod/glofs.20151229/</a><br>
>><br>
>> (directory will change based on date)<br>
>><br>
>> Thanks,<br>
>> Jason<br>
>> --<br>
>> Jason Greenlaw<br>
>> Software Developer, ERT, Inc.<br>
>> NOAA/NOS/OCS/CSDL<br>
>> <a href="http://nowcoast.noaa.gov" rel="noreferrer" target="_blank">http://nowcoast.noaa.gov</a><br>
>> <a href="mailto:Jason.Greenlaw@noaa.gov">Jason.Greenlaw@noaa.gov</a><br>
>><br>
><br>
><br>
</div></div>> _______________________________________________<br>
> pyngl-talk mailing list<br>
> List instructions, subscriber options, unsubscribe:<br>
> <a href="http://mailman.ucar.edu/mailman/listinfo/pyngl-talk" rel="noreferrer" target="_blank">http://mailman.ucar.edu/mailman/listinfo/pyngl-talk</a><br>
><br>
</blockquote></div><br></div></div></div>