[Go-essp-tech] NetCDF4 compression. write efficiency and CMIP5 policy

Charles سمير Doutriaux doutriaux1 at llnl.gov
Thu Nov 12 08:39:57 MST 2009


Martin,

I concur, I ran some pure netcdf4 (C code) tests yesterday on some  
actual topography data.

I noticed 2 things:
1- compression does seem to increase write time by a factor 3 or so...
2- shuffle triggers memory issues...

But I was doing a few things akwardly (writing short data from  
double), using my laptop, etc...

I'll be runnig some test on re-reading the data as well to see if it  
makes a difference, if reading is faster then that really is what we  
want, because we only write once but then reread it a lot of times.

Also transfering the data from one site to another will be faster  
since the data are smaller once compressed.

I'll send you a printout of my results shortly,

C.

On Nov 12, 2009, at 3:46 AM, <martin.juckes at stfc.ac.uk> wrote:

> Hi Charles,
>
> I've been looking at write speed from fortran, using dummy data,  
> looking at a single array of size 360x180x120.
>
> The speed of writing compressed data depends on the choice of chunk  
> sizes, and I'm not sure I've got the optimal values. The tables  
> below list a range of choices (the first line for uncompressed  
> data), which give speeds from 6 to 24 MB/s. These choices also  
> affect the degree of compression and the speed which different  
> subsets of a file can be read, so determining the right values is  
> not a trivial exercise. I have just started exploring the  
> possibilities, and would appreciate any guidance you have,
>
> Cheers,
> Martin
>
> File size (MB)				Write speed (MB/s)
> 31.1	360	180	120	222.5
> 15.0	8	8	8	7.7
> 16.8	18	8	4	9.1
> 20.3	90	4	1	6.0
> 15.8	90	45	2	17.0
> 16.6	90	45	1	12.1
> 16.2	32	32	1	10.5
> 15.1	32	32	2	14.0
>
> And for a larger field, 360x180x2400:
> 622.09	360	180	2400	512.39
> 292.16	8	8	8	11.3
> 309.28	18	8	4	14.78
> 340.76	90	4	1	8.69
> 279.29	90	45	2	24
> 273.24	90	45	1	12.94
> 269.25	32	32	1	8.1
> 265.75	32	32	2	11.78
>
>
> -----Original Message-----
> From: Charles ???? Doutriaux [mailto:doutriaux1 at llnl.gov]
> Sent: Tue 10/11/2009 15:01
> To: Dean N. Williams
> Cc: Pascoe, Stephen (STFC,RAL,SSTD); Juckes, Martin (STFC,RAL,SSTD);  
> Stephens, Ag (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> Subject: Re: NetCDF4 compression.  write efficiency and CMIP5 policy
>
> Yes I did some testing on the speed and it was much faster to write
> compressed, which did make sense to me since there was so much less
> data to write, and cpu time nothing next to i/o time.
>
> I'll rerun the test again today.
>
> C
>
> On Nov 10, 2009, at 4:50 AM, Dean N. Williams wrote:
>
>> This is not what we experienced. In fact, I recall just the
>> opposite. Charles Doutriaux did the work, so I'll let him respond
>> directly. Also it would be good if Ed Hartnett and Russ Rew respond
>> as well to the slowness that you are seeing. They may be able to
>> help you on this.
>>
>> Charles, if I recall correctly, were the zlib compressed netCDF
>> files read faster in CDAT?
>>
>> We are using CMOR2, in which CMOR2 does the DRS, netCDF-4 classic
>> compressed output.
>>
>> Best regards,
>> 	Dean
>>
>> On Nov 10, 2009, at 4:20 AM, <stephen.pascoe at stfc.ac.uk> wrote:
>>
>>> Hi Dean
>>>
>>> In tests we've done at BADC we have experienced 10-20x slowdown in
>>> write speed with NetCDF4 compression.  Is this typical and are
>>> modelling centres aware that they can expect a significant I/O
>>> bottleneck?
>>>
>>> This makes me think, have we said CMIP5 data *must* be compressed?
>>> Is there a danger we will get a higher volume of data than we
>>> expect because it will be uncompressed to speed up the process at
>>> the modelling centres?  How would we enforce NetCDF compression --
>>> presumably it would be discovered during replication.
>>>
>>> Cheers,
>>> Stephen.
>>>
>>> ---
>>> Stephen Pascoe  +44 (0)1235 445980
>>> British Atmospheric Data Centre
>>> Rutherford Appleton Laboratory
>>>
>>>
>>> -- 
>>> Scanned by iCritical.
>>>
>>>
>>
>
>
> --
> Scanned by iCritical.



More information about the GO-ESSP-TECH mailing list