[pyngl-talk] netcdf4-python vs PyNIO, was: xray et al (PyNIO, pandas, R)
sam.hawkins at vattenfall.com
sam.hawkins at vattenfall.com
Mon Jun 29 04:33:17 MDT 2015
Really good to see this discussion being made, and apologies if I'm broadening this to include PyNGL. I tried using PyNIO and PyNGL about 7 years ago, thinking it would be a way of harnessing the specialist, domain-specific functionality of NCL, from a language with a much nicer syntax and with access to all the other python libraries and packages. After the fairly traumatic installation procedure, I found that some of the key functions in NCL were not available (at the time) in PyNGL. I was forced to beat a retreat back to NCL.
In many ways, NCL has been miles ahead of the competition for years: named dimensions, effortless dimension reordering, coordinate-based indexing, support for all kinds of complex regridding operations etc etc. But the syntax is, well, an acquired taste. Doing low-level data 'munging' is horrendous, and it is a closed system, so difficult to make use of other tools.
Recently the development of Python modules for data analysis and plotting has advanced enormously, with Pandas, xray, Iris, cartopy, seaborn, bokeh etc. Even atmospheric scientists raised on a strict diet of FORTRAN and NCL are starting to use Python ;). Personally, I think the focus for python development should be on the key strengths from the NCL world: support for multiple file formats (GRIB etc), and support for atmospheric-science specific functions (regridding, specialist diagnostics). I don't want to offend anyone here, but I don't actually like the PyNGL/NCL system of using different resources to control plot details. I increasingly find myself using NCL for calculations, and then using python with other modules to produce plots and maps.
Is there a case for separating the current functionality into three modules, rather than the current two:
1. File format support (currently PyNIO)
2. Specialist atmospheric functions, regridding, diagnostics (mainly operating on numpy arrays)
3. Output plotting (currently PyNGL)
I realise that separating these may be more difficult than it first appears. I would be happy to hear what other people think.
From: pyngl-talk-bounces at ucar.edu [mailto:pyngl-talk-bounces at ucar.edu] On Behalf Of David Brown
Sent: Thursday, June 25, 2015 2:09 AM
To: Tom Roche
Cc: pyngl-talk at ucar.edu talk
Subject: Re: [pyngl-talk] netcdf4-python vs PyNIO, was: xray et al (PyNIO, pandas, R)
Re-engaging on this topic from earlier this month:
You are asking some good questions and I see that you also stimulated a lot of discussion on the PyAOS mailing list. want to respond here on pyngl-talk first and then I intend also to engage on the PyAOS list.
First of all we do understand that PyNIO is considered difficult to install and currently one of our highest priorities is to address this issue. We are in the process of implementing a conda install process for PyNIO and intend to have that ready within a month or so. Due to our small staff and our focus on NCL, our Python modules have been somewhat neglected in recent years. But our plans now include much more focus on our Python tools. In fact you may have noticed Mary's announcement of a new position where we are looking for someone with Python scientific programming expertise. Also we have been focused recently on updating PyNIO's capabilities with respect to more advanced features of NetCDF4/HDF5. We have good support for groups now and are close to completing support for compound data types, variable length arrays, etc. These features will be part of our new release with conda-based installation.
Concerning xray, I agree this is a very promising tool. A point that may not be totally clear from the PyAOS discussion is that this tool relies on a backend IO-module to do the low-level reading of data files. The default module is netcdf4-python but other backends are supported as well. Since PyNIO and netcdf4-python both evolved from the same python netcdf interface created years ago by Konrad Hinson, their interfaces are quite similar. Experimentally, I recently created a PyNIO backend for xray that can be used in place of netcdf4-python. The immediate advantage for an xray user is that you can now access, in a uniform manner, all the PyNIO-supported formats along with NetCDF. Hopefully once it is fully vetted, Steven Hoyer can be persuaded to add it as another alternate backend to the xray code base.
As for a basic comparison between netcdf4-python and PyNIO (from a usage point of view) here is my (obviously biased) take:
If you are only interested in NetCDF data, then there is not much reason to prefer PyNIO over netcdf4-python. Sasha's points in favor of netcdf4-python are valid and I could add that netcdf4-python has since its inception supported the complete NetCDF4 specification that we are only now putting into PyNIO.
To me the argument in favor of PyNIO boils down to this:
PyNIO makes multiple formats available in a consistent fashion that conforms to the NetCDF model.
For GRIB and HDFEOS data it adds value by providing coordinate variables (2D in the case of pre-projected data) that are derived from the very terse projection specifications given in the file. For GRIB-based vector data it also provides a rotation variable that make it simple to convert the grid-based vector direction angles to "earth"-based angles.
Finally, for all the formats, it provides basically the same file interface that has been developed over many years for NCL. We do not claim to handle every possible file correctly, but I can assure you that many tricky details have been worked out over the years, and it is generally pretty robust at this point.
For what it's worth, I did compare the performance of these tools as part of a presentation I did last year as SciPy 2014. Mostly the differences were rather small, and unless you are batch processing multi gigabytes of data, they would not be noticeable. Where there were noticeable differences, I think it is fair to say they were mostly in PyNIO's favor, although I cannot really explain why. I could go into more detail on this subject but not in this message.
On Mon, Jun 15, 2015 at 1:46 PM, Oleksandr Huziy <guziy.sasha at gmail.com> wrote:
> just dropping my 2 cents...
> 1. For me netcdf4-python is much easier to install 2. netcdf4-python
> can read multiple files as if it was only one file
> 3. Very easy to deal with dates
> 4. I've never checked the benefits, but it can exploit cython if installed..
> 1. Can read many formats in addition to netcdf
> As you can see I do not have much experience with pynio mainly because
> I use netcdf4-python more ...
> I would like to see if someone has compared their performance?
> 2015-06-15 15:29 GMT-04:00 Tom Roche <Tom_Roche at pobox.com>:
>> Louis Wicker Mon, 15 Jun 2015 13:27:27 -0500 
>> > Are u aware of [netcdf4-python] from Jeff Whitaker?
>> I thought PyNIO was "the NCAR way" to interact with netCDF from "the
>> NumPy world." Plainly I am mistaken! So my next question is, how do
>> netcdf4-python and PyNIO compare/contrast? Why use one rather than the other?
>> (And the question after that is, does answering the first question
>> require stepping into some kinda political minefield ?-)
>> TIA, Tom Roche <Tom_Roche at pobox.com>
>> : https://github.com/Unidata/netcdf4-python
>> pyngl-talk mailing list
>> List instructions, subscriber options, unsubscribe:
> pyngl-talk mailing list
> List instructions, subscriber options, unsubscribe:
pyngl-talk mailing list
List instructions, subscriber options, unsubscribe:
We have recently changed the registered offices of a number of our companies. The following are now registered at 1 Tudor Street, London, EC4Y 0AH:
Vattenfall Wind Power Ltd, Border Wind Ltd, Border Wind Farms Ltd, BW Ops Ltd, Clashindarroch Wind Farm Ltd, Eclipse Energy UK Ltd,
Eclipse Energy Company Ltd, Kentish Flats Ltd, Ormonde Energy Ltd, Ormonde Energy Holdings Ltd, Ormonde Project Company Ltd, Thanet Offshore
More information about the pyngl-talk