[ncl-talk] netcdf file size question

Kevin Hallock hallock at ucar.edu
Tue Jun 26 12:53:02 MDT 2018


Hi Hauss,

That is correct, NCL uses a consistent data type for all elements of an array without considering whether individual values in the array actually require that level of precision. If an array contained a mix of data types, then in order to access a specific index it would be necessary to determine the size of every element before it; in a single-data-type array (let’s say “float”), the memory address of a particular index is easily determined as an offset from the beginning of the array equal to “sizeof(float) * index”, where the size of a float variable is 4 bytes. Just determining the memory address of an index in a mixed-type array is difficult enough, so trying to perform an actual computation across a multi-dimensional array of mixed types would likely be very slow compared to a homogeneous array.


If you’re certain that losing several decimal places of precision is alright, then you could try using NCL’s pack_values <http://www.ncl.ucar.edu/Document/Functions/Contributed/pack_values.shtml> function. pack_values can “pack” a float (4 bytes per value) or double (8 bytes per value) type array into either “short” (2 bytes) or “byte” (1 byte) arrays, using a multiplier and an offset value to “unpack” an approximation of the original float/double data.

Please note that “packing” data into a smaller data type is a form of “lossy” compression, meaning it may not be possible to recover the exact original data from the compressed data.

If you have a float array “a_float” that you want to compress by a factor of 2, you could pack_values() it into a short array:
a_short = pack_values(a_float, "short", False)
a_unpacked = short2flt(a_short)			; This is essentially the same as "(a_short * a_short at scale_factor) + a_short at add_offset"

You will likely want to compare your original array with the new packed-then-unpacked array to evaluate whether the lost precision is acceptable for your use case.

It is also possible to pack values into a “byte” array (4 bytes to 1 byte compression in this case), although the loss of precision will be even more apparent:
a_byte = pack_values(a_float, "byte", False)
a_unpacked = byte2flt(a_byte)

Alternatively, there is a way to do this outside of NCL for a netcdf file that already exists using a software package called NetCDF Operators <http://nco.sourceforge.net/>. In particular, the ncpdq <http://nco.sourceforge.net/nco.html#ncpdq> operator can be used to pack data as follows:
ncpdq infile.nc outfile.nc

I hope this helps,
Kevin

> On Jun 25, 2018, at 7:47 PM, Hauss Reinbold <Hauss.Reinbold at dri.edu> wrote:
> 
> Hi all,
> 
> I’m creating a large netcdf dataset via NCL and I was looking to reduce the file size by reducing the number of decimal places the float values were holding, but it doesn’t look like it worked. In looking into it further, it seems like NCL allocates space in the file by data type, regardless of what value each individual index of an array might have. Is that correct?
> 
> I did some looking and couldn’t see a way to reduce file size explicitly other than by changing data type, which I don’t think I can do. Is there a way to reduce the file size of the netcdf file by limiting the number of decimal places? Or is compression or changing the data type my only alternative here?
> 
> Thanks for any help on this.
> 
> Hauss Reinbold
> 
> PUBLIC RECORDS NOTICE: In accordance with NRS Chapter 239, this email and responses, unless otherwise made confidential by law, may be subject to the Nevada Public Records laws and may be disclosed to the public upon request.
> _______________________________________________
> ncl-talk mailing list
> ncl-talk at ucar.edu
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20180626/477ae76b/attachment.html>


More information about the ncl-talk mailing list