[ncl-talk] Reading a large netcdf file

Wed Dec 27 14:56:11 MST 2017

CORRECTION.  I suggested using nccopy to add chunking to a large netcdf
file.  Current versions of nccopy do not work correctly in this case.  If
you want to convert large files from "contiguous" to chunked, use an NCO
command like this:

    ncks --deflate 0 --chunk_policy g3d --chunk_dimension time,1 \
        --chunk_dimension lat,180 --chunk_dimension lon,180 \
        tas_input.nc tas_output.nc

However, as I said in #5 earlier, I do not think the lack of chunking is
Tomoko's real problem.  Large test files with "contiguous" have
consistently performed well with the current NCL release on my Mac and
Linux systems.

--Dave

On Mon, Dec 18, 2017 at 10:14 PM, Dave Allured - NOAA Affiliate <
dave.allured at noaa.gov> wrote:

> 1)  I agree with Dennis, the use of "contiguous" for an array this large
> is suspicious and unusual.  "Contiguous" means by definition that chunking
> is not in use.  See Netcdf docs.  However:
>
> 2)  I made a 9 Gb "contiguous" test file with same dimensions, attributes,
> and coordinate vars as Tomoko's.  Your small script worked *correctly* on
> this test file, on two different systems:
>
>     NCL 6.4.0, Mac OS 10.12.6, 32 Gb installed memory
>     NCL 6.4.0, Centos Linux, 256 Gb installed memory
>
> On Mac, read 9 Gb contiguous from *local solid state disk* in 7 seconds.
> On Linux, read 9 Gb contiguous from *network file server* in 80 seconds.
>
> 3.  Speculation on possible problems:
>
> 3a.  Your system does not have enough installed memory, forcing virtual
> memory as mentioned by Guido.  VM can be remarkably slow with large arrays.
>
> 3b.  Related to 3a, if you are submitting a batch script, you might need
> to explicitly increase the memory resource request for the batch job.
>
> 3c.  You said you are using NCL 6.3.0.  There is a chance that this older
> version with its older internal libraries may not be correctly handling
> large NC4 contiguous arrays.
>
> 3d.  You might not be seeing the actual error in your batch job.  You
> reported timeout, not an explicit error message.  Try logging in
> interactively, and run this test in NCL command line mode.
>
> 3e.  Possible hidden structural problem inside this particular file.
>
> 4.  Possible solutions or work-arounds.  Besides the things I suggested in
> 3 above:
>
> 4a.  Upgrade to the current NCL version 6.4.0.
>
> 4b.  Try the same test on a different computer with plenty of memory.
>
> 4c.  Add chunking with reasonable chunk sizes.  Here is one way to do it,
> there are others.  This Linux command also added compression, and it ran in
> less than 2.5 minutes on both of my systems:
>
>     nccopy -c time/1,i/180,j/180 -d1 tas_input.nc tas_output.nc
>
> * Note, this is imperfect because the time coordinate is left with chunk
> size 1.  Better would be a different tool that could independently set a
> more appropriate time coordinate chunk size, something like 1000 to 100000.
>
> 5.  I am not convinced that "contiguous" is the real problem here.  But if
> this turns out to be the problem, then the best long term solution is to
> get the data provider to avoid large "contiguous" arrays, and add chunking
> for arrays larger than a few hundred Mbytes.
>
> --Dave
>
>
> On Mon, Dec 18, 2017 at 6:53 PM, Dennis Shea <shea at ucar.edu> wrote:
>
>> FYI_1:
>>
>> re:     tas:_Storage = "*contiguous*"
>>
>> 2011 Unidata NetCDF Workshop > Chunking and Deflating Data with NetCDF-4
>> https://www.unidata.ucar.edu/software/netcdf/workshops/2011/
>> nc4chunking/Contiguous.html
>>
>>
>>    - *Only works for fixed-sized datasets (those without any unlimited
>>    dimensions.) *
>>    - *You can't use compression or other filters with contiguous data.*
>>    - *Use this for smallish variables*, like coordinate variables.
>>    - *This is the default storage for fixed-sized variables with no
>>    filters.*
>>
>> NOTE the recommendation: "*Use this for smallish variable*s"
>>
>> *9GB* is not  "smallish"
>>
>> I *SPECULATE* there is no chunking being used.
>>
>> ===
>> FYI_2: the file is not CF-conforming.
>>
>> int time(time) ;
>> string time:long_name = "time" ;
>> string time:*units = "month"* ;
>> time:_Storage = "contiguous" ;
>> time:_Endianness = "little" ;
>>
>> "units" of "months" is not CF-conformant.
>>
>> ===
>> FYI_3:
>> This IPSL-CM5A-LR was (likely) *not created by CNRS*.
>> It contains regridded data. [ ...*rgrd.nc <http://rgrd.nc> ]*
>>
>> ===========================================
>> I speculate that your system may be trying to allocate all 9GB of memory
>> before reading values from the file.
>>
>>
>>
>> On Mon, Dec 18, 2017 at 5:50 PM, Tomoko Koyama <
>> Tomoko.Koyama at colorado.edu> wrote:
>>
>>> Thank you very much, Dave.
>>>
>>> I hope the following is showing enough information to nail down the
>>> cause.
>>>
>>> [koyama at login03 IPSL-CM5A-LR]$ ncdump -hs tas_day_IPSL-CM5A-LR_rcp85_r1i
>>> 1p1_20060101-22051231.rgrd.nc
>>> netcdf tas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd {
>>> dimensions:
>>> time = 73000 ;
>>> i = 180 ;
>>> j = 180 ;
>>> variables:
>>> int time(time) ;
>>> string time:long_name = "time" ;
>>> string time:units = "month" ;
>>> time:_Storage = "contiguous" ;
>>> time:_Endianness = "little" ;
>>> float i(i) ;
>>> string i:units = "none" ;
>>> string i:long_name = "i index" ;
>>> i:_Storage = "contiguous" ;
>>> float j(j) ;
>>> string j:units = "none" ;
>>> string j:long_name = "j index" ;
>>> j:_Storage = "contiguous" ;
>>> float tas(time, i, j) ;
>>> string tas:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMI
>>> P5/dataLocation gridspecFile: gridspec_atmos_fx_IPSL-CM5A-LR_
>>> rcp85_r0i0p0.nc areacella: areacella_fx_IPSL-CM5A-LR_rcp85_r0i0p0.nc" ;
>>> string tas:coordinates = "height" ;
>>> string tas:history = "2011-08-16T22:13:26Z altered by CMOR: Treated
>>> scalar dimension: \'height\'. 2011-08-16T22:13:26Z altered by CMOR:
>>> replaced missing value flag (9.96921e+36) with standard missing value
>>> (1e+20). 2011-08-16T22:13:45Z altered by CMOR: Inverted axis: lat." ;
>>> string tas:cell_measures = "area: areacella" ;
>>> string tas:cell_methods = "time: mean (interval: 30 minutes)" ;
>>> string tas:original_name = "t2m" ;
>>> string tas:units = "K" ;
>>> string tas:long_name = "Near-Surface Air Temperature" ;
>>> string tas:standard_name = "air_temperature" ;
>>> string tas:remap = "remapped via ESMF_regrid_with_weights: Bilinear" ;
>>> tas:missing_value = 1.e+20f ;
>>> tas:_FillValue = 1.e+20f ;
>>> tas:_Storage = "contiguous" ;
>>> float lat(j, i) ;
>>> string lat:long_name = "latitude" ;
>>> string lat:units = "degrees_north" ;
>>> lat:_FillValue = -999.f ;
>>> lat:_Storage = "contiguous" ;
>>> float lon(j, i) ;
>>> string lon:long_name = "longitude" ;
>>> string lon:units = "degrees_east" ;
>>> lon:_FillValue = -999.f ;
>>> lon:_Storage = "contiguous" ;
>>>
>>> // global attributes:
>>> :_Format = "netCDF-4” ;
>>> }
>>>
>>> On Dec 18, 2017, at 4:05 PM, Dave Allured - NOAA Affiliate <
>>> dave.allured at noaa.gov> wrote:
>>>
>>> Tomoko,
>>>
>>> Please add the "s" flag to Dennis's request.  This will show chunk size
>>> parameters that may be relevant:
>>>
>>> %> ncdump -hs tas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd
>>> .nc
>>> <http://secure-web.cisco.com/1RvOpL6YjTvAr2KOsZZoipbunMq6bhiktJdhz1n0TOzrW7Kca4nc59oM2g-2N-kBfsTvJq7EbmlS0Z0eO4xIOUUGW_31tXoipuew_3rVpiohC-518ODXnyFmFzR3smzNsHtNpBxiusoXkUpFilSVSwqsMZkdC-XQlFW8u11QwrHot_ne7RdeFPhoD5wROJB6k9CdtgjpxTb-_aez0Tad7dXeTlQS3e03cvhtyTppAhrh3JfDiMZ-wPPL3IxrC8RKfrObsJ1IJzS-axy1820aJai6zHy_wSWi3R-c1HLtu4HAI5n3tr2EPhcRoJP3pwWV5buWNtzKmLv8gwqUjknsHNOSZ3JQcuEjcH701UsyOhZQ6x3bS5Gim_Mb6Ez5ypNgivXHE-Baroa1_LD5JlLe0hvRxMkNAYoIm6NACLkfGmDh7xi3ImE5TglAUjnPxz6te/http%3A%2F%2Ftas_day_ipsl-cm5a-lr_rcp85_r1i1p1_20060101-22051231.rgrd.nc%2F>
>>>
>>> There are known problems with Netcdf-4 chunk sizes that can dramatically
>>> slow down reading.  In particular, chunk size exceeding chunk cache size,
>>> or chunk size exceeding available memory.  If we know the structure
>>> details, we may be able to suggest a solution.
>>>
>>> I recommend chunk sizes in the approximate range of 100 Kbytes to 4
>>> Mbytes for large netcdf-4 files.
>>>
>>> --Dave
>>>
>>>
>>> On Mon, Dec 18, 2017 at 8:05 AM, Dennis Shea <shea at ucar.edu> wrote:
>>>
>>>> When you have a file issue, you should include some information:
>>>>
>>>> (a) what what either of the fillowing show?
>>>>
>>>> %> ncl_filedump tas_day_IPSL-CM5A-LR_rcp85_r1i
>>>> 1p1_20060101-22051231.rgrd.nc
>>>> <http://secure-web.cisco.com/1OW_a3n1krqUftMdpsEFQLF9MnaCbIIUe4GLVf5ETSY_RBlx7nPiMsNeW7GTg_jFwVFYKnsxK2L_qXy_j6XTtzdM3YZ8QjDyik2JwmTcJQg3siC3KZrDZA2cO2Gv32zlVRdcGx4_lj1NbqlqF1YPNXoYRcBDbT47dqqLWWXTS21VFlWnmXqK7DraSHCwEtg6JBYE_A21VnfoUm9oWJiNW7wrZA5H6ruPd-80HJdDBmfejeC7DmM__iHAL3xtTwdevGM47ItGs9VoGK452GL3K5HZzOm00XzEB9aTMT75DkJ9LWEc6vFeibxgel2GRxCa4rIbcMJc72l-aWsXOxlsterNiDlaTwTO5UCTcAqGLgirzEPEIzEl3bmiJeKRefywuAz1IBVNpmwInKYF-i3vwAjBNmWMoleFEeRYXQo8RkYs7665o-yr9eBchBgNzFXkn/http%3A%2F%2Ftas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc>
>>>>
>>>> or
>>>>
>>>> %> ncdump -h tas_day_IPSL-CM5A-LR_rcp85_r1i
>>>> 1p1_20060101-22051231.rgrd.nc
>>>> <http://secure-web.cisco.com/1OW_a3n1krqUftMdpsEFQLF9MnaCbIIUe4GLVf5ETSY_RBlx7nPiMsNeW7GTg_jFwVFYKnsxK2L_qXy_j6XTtzdM3YZ8QjDyik2JwmTcJQg3siC3KZrDZA2cO2Gv32zlVRdcGx4_lj1NbqlqF1YPNXoYRcBDbT47dqqLWWXTS21VFlWnmXqK7DraSHCwEtg6JBYE_A21VnfoUm9oWJiNW7wrZA5H6ruPd-80HJdDBmfejeC7DmM__iHAL3xtTwdevGM47ItGs9VoGK452GL3K5HZzOm00XzEB9aTMT75DkJ9LWEc6vFeibxgel2GRxCa4rIbcMJc72l-aWsXOxlsterNiDlaTwTO5UCTcAqGLgirzEPEIzEl3bmiJeKRefywuAz1IBVNpmwInKYF-i3vwAjBNmWMoleFEeRYXQo8RkYs7665o-yr9eBchBgNzFXkn/http%3A%2F%2Ftas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc>
>>>>
>>>> (b) what version of NCL are you using>
>>>>
>>>> %> ncl -V
>>>>
>>>> (c) your system info
>>>>
>>>> %> uname -a
>>>>
>>>> -----------
>>>>
>>>>  fdir=“/root/dir4ncl/“
>>>>  fili="tas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc
>>>> <http://secure-web.cisco.com/1OW_a3n1krqUftMdpsEFQLF9MnaCbIIUe4GLVf5ETSY_RBlx7nPiMsNeW7GTg_jFwVFYKnsxK2L_qXy_j6XTtzdM3YZ8QjDyik2JwmTcJQg3siC3KZrDZA2cO2Gv32zlVRdcGx4_lj1NbqlqF1YPNXoYRcBDbT47dqqLWWXTS21VFlWnmXqK7DraSHCwEtg6JBYE_A21VnfoUm9oWJiNW7wrZA5H6ruPd-80HJdDBmfejeC7DmM__iHAL3xtTwdevGM47ItGs9VoGK452GL3K5HZzOm00XzEB9aTMT75DkJ9LWEc6vFeibxgel2GRxCa4rIbcMJc72l-aWsXOxlsterNiDlaTwTO5UCTcAqGLgirzEPEIzEl3bmiJeKRefywuAz1IBVNpmwInKYF-i3vwAjBNmWMoleFEeRYXQo8RkYs7665o-yr9eBchBgNzFXkn/http%3A%2F%2Ftas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc>
>>>> "
>>>>  fn=addfile(fdir+fili,"r")
>>>>  buff=fn->tas
>>>>  print(“Data is stored”)
>>>>
>>>> On Mon, Dec 18, 2017 at 3:35 AM, Guido Cioni <guidocioni at gmail.com>
>>>> wrote:
>>>>
>>>>> Tomoko,
>>>>> 9 GB is anything but "large", although the concept of "large" is
>>>>> highly subjective :P
>>>>>
>>>>> I've successfully read SINGLE netcdf files in NCL whose size was
>>>>> ~500GB so that shouldn't be the problem. For some reason a netcdf file of
>>>>> some size, say 400 GB, which has many timesteps is read more slowly than a
>>>>> file with the same size but with less timesteps; that was my impression.
>>>>>
>>>>> You are setting a lot of options which I think are not needed. Did you
>>>>> just try to read the file with this line?
>>>>>
>>>>> fn=addfile(fdir+fili,"r")
>>>>>
>>>>>
>>>>> If it still takes a lot of time it could be system-dependent. When
>>>>> creating the variable NCL stores it into the RAM. If the system does not
>>>>> have enough RAM, some virtual memory will be created on your hard drive,
>>>>> which can slow down everything. But honestly I don't think you're even
>>>>> close to saturate your system's RAM. The problem may lie somewhere else...
>>>>>
>>>>> Let us know.
>>>>>
>>>>> On 18. Dec 2017, at 07:17, Tomoko Koyama <Tomoko.Koyama at Colorado.EDU>
>>>>> wrote:
>>>>>
>>>>> I’d like to extract some data from a large netcdf file, which size is
>>>>> about 9GB.
>>>>>
>>>>> The following shows the partial script, but a submitted job was killed
>>>>> before “Data is stored” message appeared.
>>>>> (I attempted several times with 3-hr max walltime )
>>>>>
>>>>> setfileoption("nc", "FileStructure", "Advanced")
>>>>> fdir=“/root/dir4ncl/“
>>>>> fili="tas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc
>>>>> <http://secure-web.cisco.com/1OW_a3n1krqUftMdpsEFQLF9MnaCbIIUe4GLVf5ETSY_RBlx7nPiMsNeW7GTg_jFwVFYKnsxK2L_qXy_j6XTtzdM3YZ8QjDyik2JwmTcJQg3siC3KZrDZA2cO2Gv32zlVRdcGx4_lj1NbqlqF1YPNXoYRcBDbT47dqqLWWXTS21VFlWnmXqK7DraSHCwEtg6JBYE_A21VnfoUm9oWJiNW7wrZA5H6ruPd-80HJdDBmfejeC7DmM__iHAL3xtTwdevGM47ItGs9VoGK452GL3K5HZzOm00XzEB9aTMT75DkJ9LWEc6vFeibxgel2GRxCa4rIbcMJc72l-aWsXOxlsterNiDlaTwTO5UCTcAqGLgirzEPEIzEl3bmiJeKRefywuAz1IBVNpmwInKYF-i3vwAjBNmWMoleFEeRYXQo8RkYs7665o-yr9eBchBgNzFXkn/http%3A%2F%2Ftas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc>
>>>>> "
>>>>> fn=addfile(fdir+fili,"r")
>>>>> setfileoption("nc","Format","NetCDF4Classic")
>>>>> buff=fn->tas
>>>>> print(“Data is stored”)
>>>>>
>>>>> Does it simply take a long time to read?
>>>>> Is there anyway to speed up to read a large netcdf file?
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Tomoko
>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20171227/44baa174/attachment.html>