[Go-essp-tech] NetCDF4 compression. write efficiency and CMIP5 policy
Charles سمير Doutriaux
doutriaux1 at llnl.gov
Thu Nov 12 08:39:57 MST 2009
Martin,
I concur, I ran some pure netcdf4 (C code) tests yesterday on some
actual topography data.
I noticed 2 things:
1- compression does seem to increase write time by a factor 3 or so...
2- shuffle triggers memory issues...
But I was doing a few things akwardly (writing short data from
double), using my laptop, etc...
I'll be runnig some test on re-reading the data as well to see if it
makes a difference, if reading is faster then that really is what we
want, because we only write once but then reread it a lot of times.
Also transfering the data from one site to another will be faster
since the data are smaller once compressed.
I'll send you a printout of my results shortly,
C.
On Nov 12, 2009, at 3:46 AM, <martin.juckes at stfc.ac.uk> wrote:
> Hi Charles,
>
> I've been looking at write speed from fortran, using dummy data,
> looking at a single array of size 360x180x120.
>
> The speed of writing compressed data depends on the choice of chunk
> sizes, and I'm not sure I've got the optimal values. The tables
> below list a range of choices (the first line for uncompressed
> data), which give speeds from 6 to 24 MB/s. These choices also
> affect the degree of compression and the speed which different
> subsets of a file can be read, so determining the right values is
> not a trivial exercise. I have just started exploring the
> possibilities, and would appreciate any guidance you have,
>
> Cheers,
> Martin
>
> File size (MB) Write speed (MB/s)
> 31.1 360 180 120 222.5
> 15.0 8 8 8 7.7
> 16.8 18 8 4 9.1
> 20.3 90 4 1 6.0
> 15.8 90 45 2 17.0
> 16.6 90 45 1 12.1
> 16.2 32 32 1 10.5
> 15.1 32 32 2 14.0
>
> And for a larger field, 360x180x2400:
> 622.09 360 180 2400 512.39
> 292.16 8 8 8 11.3
> 309.28 18 8 4 14.78
> 340.76 90 4 1 8.69
> 279.29 90 45 2 24
> 273.24 90 45 1 12.94
> 269.25 32 32 1 8.1
> 265.75 32 32 2 11.78
>
>
> -----Original Message-----
> From: Charles ???? Doutriaux [mailto:doutriaux1 at llnl.gov]
> Sent: Tue 10/11/2009 15:01
> To: Dean N. Williams
> Cc: Pascoe, Stephen (STFC,RAL,SSTD); Juckes, Martin (STFC,RAL,SSTD);
> Stephens, Ag (STFC,RAL,SSTD); go-essp-tech at ucar.edu
> Subject: Re: NetCDF4 compression. write efficiency and CMIP5 policy
>
> Yes I did some testing on the speed and it was much faster to write
> compressed, which did make sense to me since there was so much less
> data to write, and cpu time nothing next to i/o time.
>
> I'll rerun the test again today.
>
> C
>
> On Nov 10, 2009, at 4:50 AM, Dean N. Williams wrote:
>
>> This is not what we experienced. In fact, I recall just the
>> opposite. Charles Doutriaux did the work, so I'll let him respond
>> directly. Also it would be good if Ed Hartnett and Russ Rew respond
>> as well to the slowness that you are seeing. They may be able to
>> help you on this.
>>
>> Charles, if I recall correctly, were the zlib compressed netCDF
>> files read faster in CDAT?
>>
>> We are using CMOR2, in which CMOR2 does the DRS, netCDF-4 classic
>> compressed output.
>>
>> Best regards,
>> Dean
>>
>> On Nov 10, 2009, at 4:20 AM, <stephen.pascoe at stfc.ac.uk> wrote:
>>
>>> Hi Dean
>>>
>>> In tests we've done at BADC we have experienced 10-20x slowdown in
>>> write speed with NetCDF4 compression. Is this typical and are
>>> modelling centres aware that they can expect a significant I/O
>>> bottleneck?
>>>
>>> This makes me think, have we said CMIP5 data *must* be compressed?
>>> Is there a danger we will get a higher volume of data than we
>>> expect because it will be uncompressed to speed up the process at
>>> the modelling centres? How would we enforce NetCDF compression --
>>> presumably it would be discovered during replication.
>>>
>>> Cheers,
>>> Stephen.
>>>
>>> ---
>>> Stephen Pascoe +44 (0)1235 445980
>>> British Atmospheric Data Centre
>>> Rutherford Appleton Laboratory
>>>
>>>
>>> --
>>> Scanned by iCritical.
>>>
>>>
>>
>
>
> --
> Scanned by iCritical.
More information about the GO-ESSP-TECH
mailing list