[Go-essp-tech] NetCDF4 compression. write efficiency and CMIP5 policy

martin.juckes at stfc.ac.uk martin.juckes at stfc.ac.uk
Thu Nov 12 09:48:27 MST 2009


Hello Charles,

I'll be interested to see all that. In re-reading the data, I'm also interested in the performance for re-reading slices from the data -- e.g. a single latitude x longitude field from a long time series. My initial tests show that this can be slowed by a poor choice of the "chunking" parameters specified for the compression. This may be significant for archive services we want to run. 

I haven't tried the shuffle options yet.

On the choice of data: it might be worth trying a range of data sets. I've just tried a smooth, quadratically varying field and white noise. For the latter option, as might be expected, the compression does not reduce file size. I expect that the smooth field is more relevant, but haven't looked at real data yet.

Cheers,
Martin 

> -----Original Message-----
> From: Charles سمير Doutriaux [mailto:doutriaux1 at llnl.gov]
> Sent: 12 November 2009 15:40
> To: Juckes, Martin (STFC,RAL,SSTD)
> Cc: williams13 at llnl.gov; Pascoe, Stephen (STFC,RAL,SSTD); Stephens, Ag
> (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> Subject: Re: NetCDF4 compression. write efficiency and CMIP5 policy
> 
> Martin,
> 
> I concur, I ran some pure netcdf4 (C code) tests yesterday on some
> actual topography data.
> 
> I noticed 2 things:
> 1- compression does seem to increase write time by a factor 3 or so...
> 2- shuffle triggers memory issues...
> 
> But I was doing a few things akwardly (writing short data from
> double), using my laptop, etc...
> 
> I'll be runnig some test on re-reading the data as well to see if it
> makes a difference, if reading is faster then that really is what we
> want, because we only write once but then reread it a lot of times.
> 
> Also transfering the data from one site to another will be faster
> since the data are smaller once compressed.
> 
> I'll send you a printout of my results shortly,
> 
> C.
> 
> On Nov 12, 2009, at 3:46 AM, <martin.juckes at stfc.ac.uk> wrote:
> 
> > Hi Charles,
> >
> > I've been looking at write speed from fortran, using dummy data,
> > looking at a single array of size 360x180x120.
> >
> > The speed of writing compressed data depends on the choice of chunk
> > sizes, and I'm not sure I've got the optimal values. The tables
> > below list a range of choices (the first line for uncompressed
> > data), which give speeds from 6 to 24 MB/s. These choices also
> > affect the degree of compression and the speed which different
> > subsets of a file can be read, so determining the right values is
> > not a trivial exercise. I have just started exploring the
> > possibilities, and would appreciate any guidance you have,
> >
> > Cheers,
> > Martin
> >
> > File size (MB)				Write speed (MB/s)
> > 31.1	360	180	120	222.5
> > 15.0	8	8	8	7.7
> > 16.8	18	8	4	9.1
> > 20.3	90	4	1	6.0
> > 15.8	90	45	2	17.0
> > 16.6	90	45	1	12.1
> > 16.2	32	32	1	10.5
> > 15.1	32	32	2	14.0
> >
> > And for a larger field, 360x180x2400:
> > 622.09	360	180	2400	512.39
> > 292.16	8	8	8	11.3
> > 309.28	18	8	4	14.78
> > 340.76	90	4	1	8.69
> > 279.29	90	45	2	24
> > 273.24	90	45	1	12.94
> > 269.25	32	32	1	8.1
> > 265.75	32	32	2	11.78
> >
> >
> > -----Original Message-----
> > From: Charles ???? Doutriaux [mailto:doutriaux1 at llnl.gov]
> > Sent: Tue 10/11/2009 15:01
> > To: Dean N. Williams
> > Cc: Pascoe, Stephen (STFC,RAL,SSTD); Juckes, Martin (STFC,RAL,SSTD);
> > Stephens, Ag (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> > Subject: Re: NetCDF4 compression.  write efficiency and CMIP5 policy
> >
> > Yes I did some testing on the speed and it was much faster to write
> > compressed, which did make sense to me since there was so much less
> > data to write, and cpu time nothing next to i/o time.
> >
> > I'll rerun the test again today.
> >
> > C
> >
> > On Nov 10, 2009, at 4:50 AM, Dean N. Williams wrote:
> >
> >> This is not what we experienced. In fact, I recall just the
> >> opposite. Charles Doutriaux did the work, so I'll let him respond
> >> directly. Also it would be good if Ed Hartnett and Russ Rew respond
> >> as well to the slowness that you are seeing. They may be able to
> >> help you on this.
> >>
> >> Charles, if I recall correctly, were the zlib compressed netCDF
> >> files read faster in CDAT?
> >>
> >> We are using CMOR2, in which CMOR2 does the DRS, netCDF-4 classic
> >> compressed output.
> >>
> >> Best regards,
> >> 	Dean
> >>
> >> On Nov 10, 2009, at 4:20 AM, <stephen.pascoe at stfc.ac.uk> wrote:
> >>
> >>> Hi Dean
> >>>
> >>> In tests we've done at BADC we have experienced 10-20x slowdown in
> >>> write speed with NetCDF4 compression.  Is this typical and are
> >>> modelling centres aware that they can expect a significant I/O
> >>> bottleneck?
> >>>
> >>> This makes me think, have we said CMIP5 data *must* be compressed?
> >>> Is there a danger we will get a higher volume of data than we
> >>> expect because it will be uncompressed to speed up the process at
> >>> the modelling centres?  How would we enforce NetCDF compression --
> >>> presumably it would be discovered during replication.
> >>>
> >>> Cheers,
> >>> Stephen.
> >>>
> >>> ---
> >>> Stephen Pascoe  +44 (0)1235 445980
> >>> British Atmospheric Data Centre
> >>> Rutherford Appleton Laboratory
> >>>
> >>>
> >>> --
> >>> Scanned by iCritical.
> >>>
> >>>
> >>
> >
> >
> > --
> > Scanned by iCritical.



More information about the GO-ESSP-TECH mailing list