[pyngl-talk] delegation, division of labor, packaging, was: netcdf4-python vs PyNIO

White, George George.White at dfo-mpo.gc.ca
Tue Jun 30 12:26:34 MDT 2015

A few remarks, tempered by "the current economic and political realities" (TCEPR) which include fact that many current users are approaching the ages of dementia and/or retirement and that replacements may arrive long after the former occupant of the position can offer assistance help

 Consolidation and rationalization of overlapping projects and robust packaging make  life easier for someone arriving fresh on the scene, but may force them to learn unfamiliar API's and packaging schemes.

>> 1. PyNIO should delegate basic netCDF4 support to netcdf4-python.

We have seen many efforts to provide hdf5 and or netcdf support for various platforms.  Many of these efforts have focused on the immediate requirements of the developers and don't attempt to cover all the edges cases or are encounter constructs that are difficult to implement properly in the chosen platform.   API's and packaging may vary according to developer preferences, prior experiences, and interests.   New users may prefer API's that resemble those they have used in the past, e.g., Matlab, R, Java, etc. or that are packaged in a way where they have prior experience.   

Many API's have multiple layers, a simplified model that makes it easy to do simple things, and an more complex model that makes it possible to do the hard stuff.  Maybe this can be extended to layers that make it easier for Matlab (numpy), R (pandas), etc. users to accomplish tasks.

>> 2. PyNIO should focus on supporting additional/non-netCDF formats.

  a. Merging HDF5 support into netcdf4-python:

There are reasons to keep hdf5 separate from netcdf4.  The former is powerful but complex.  NetCDF4 provides a simpler data model that works well for a large user base.   TCEPR implies very long lags in propagating changes.  Some things may be done 
In hdf5 and only after they have proven useful in lots of use cases should they move into NetCDF4.

  b.  Support for legacy formats:

My feeling is that many groups would like to adopt NetCDF4/HDF5 if not for TCEPR.   Lack of familiarity and uncertainty over
long-term support for new formats inhibit change, but here is also the need to keep legacy systems going because TCEPR prohibit replacement of system that are currently doing a satisfactory job.   Having tools that support a broad range of 
legacy or special-purpose formats can help build a community of users with diverse backgrounds, expertise, and experience.

>> 3. PyNIO should adopt netcdf4-python as a dependency.

If this simplifies the installer, it could be a win even very little of the current netcdf4-python is actually used.

-----Original Message-----
From: pyngl-talk-bounces at ucar.edu [mailto:pyngl-talk-bounces at ucar.edu] On Behalf Of Tom Roche
Sent: June-30-15 12:04 PM
To: pyngl-talk at ucar.edu
Subject: Re: [pyngl-talk] delegation, division of labor, packaging, was: netcdf4-python vs PyNIO

Tom Roche Thu Jun 25 13:47:45 MDT 2015[1]
>> a more sensible policy--ceteris paribus[2]--for the allocation of scarce resources would be layering+[division of labor (DoL)]:

>> 1. PyNIO should delegate basic netCDF4 support to netcdf4-python.

>> 2. PyNIO should focus on supporting additional/non-netCDF formats.

>> 3. PyNIO should adopt netcdf4-python as a dependency.

David Brown Mon, 29 Jun 2015 19:02:20 -0600[3]
> the new [PyNIO] development is not just for NetCDF4 but also to expose the HDF5 capabilities of the NIO library.

So have you maybe discussed merging your HDF5 code into netcdf4-python[4] with the Unidata folks[5]?

> if we knew that the NetCDF library would be providing more complete coverage of HDF5 in the near future, it still might make sense to use netcdf4-python. I am not sure why certain things seem to be missing.

So have you maybe discussed *that* with the Unidata folks? I dunno if you/CISL and they/Unidata have facetime opportunities (are they "walk-down-the-hall-able"?) but certainly they have a web interface for this[6].

Just to be clear: I'm not trying to valorize or deprecate anyone's coding abilities. I *am* trying to emphasize scarcity of coding resources (notably, person-time) available for this task and important, related others (such as testing, maintenance, packaging, documentation) that are all-too-often neglected. If my empirical claim

>> Current economic and political "realities" imply that budgets for "the sort of stuff" listizens in general do, and NCAR/UCAR specifically, are unlikely to improve significantly near-term.

is plausible, then ISTM (and you seem to agree) layering+DoL is probably the most rational response to that scarcity, in the sense that it delivers more-useful code more sustainably over the longterm. Layering+DoL is almost never the most *emotionally-satisfying* response (which all-too-often is to say "I'll just suck it up and code this *myself*!" accompanied by crotch-tugging :-) since adopting dependencies is never fun and is real work. (Lemme emphasize that, having worked for a three-letter acronym that develops very-large-scale software systems, I *know* dependency management is *not* fun.)

HTH, Tom Roche <Tom_Roche at pobox.com>

[1]: http://mailman.ucar.edu/pipermail/pyngl-talk/2015-June/000052.html
[2]: https://en.wiktionary.org/wiki/ceteris_paribus
[3]: http://mailman.ucar.edu/pipermail/pyngl-talk/2015-June/000058.html
[4]: http://unidata.github.io/netcdf4-python/
[5]: https://github.com/orgs/Unidata/people
[6]: https://github.com/Unidata/netcdf4-python/issues
pyngl-talk mailing list
List instructions, subscriber options, unsubscribe:

More information about the pyngl-talk mailing list