[ncl-talk] Reading a large netcdf file
Dennis Shea
shea at ucar.edu
Mon Dec 18 18:53:34 MST 2017
FYI_1:
re: tas:_Storage = "*contiguous*"
2011 Unidata NetCDF Workshop > Chunking and Deflating Data with NetCDF-4
https://www.unidata.ucar.edu/software/netcdf/workshops/2011/
nc4chunking/Contiguous.html
- *Only works for fixed-sized datasets (those without any unlimited
dimensions.) *
- *You can't use compression or other filters with contiguous data.*
- *Use this for smallish variables*, like coordinate variables.
- *This is the default storage for fixed-sized variables with no
filters.*
NOTE the recommendation: "*Use this for smallish variable*s"
*9GB* is not "smallish"
I *SPECULATE* there is no chunking being used.
===
FYI_2: the file is not CF-conforming.
int time(time) ;
string time:long_name = "time" ;
string time:*units = "month"* ;
time:_Storage = "contiguous" ;
time:_Endianness = "little" ;
"units" of "months" is not CF-conformant.
===
FYI_3:
This IPSL-CM5A-LR was (likely) *not created by CNRS*.
It contains regridded data. [ ...*rgrd.nc <http://rgrd.nc> ]*
===========================================
I speculate that your system may be trying to allocate all 9GB of memory
before reading values from the file.
On Mon, Dec 18, 2017 at 5:50 PM, Tomoko Koyama <Tomoko.Koyama at colorado.edu>
wrote:
> Thank you very much, Dave.
>
> I hope the following is showing enough information to nail down the cause.
>
> [koyama at login03 IPSL-CM5A-LR]$ ncdump -hs tas_day_IPSL-CM5A-LR_rcp85_
> r1i1p1_20060101-22051231.rgrd.nc
> netcdf tas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd {
> dimensions:
> time = 73000 ;
> i = 180 ;
> j = 180 ;
> variables:
> int time(time) ;
> string time:long_name = "time" ;
> string time:units = "month" ;
> time:_Storage = "contiguous" ;
> time:_Endianness = "little" ;
> float i(i) ;
> string i:units = "none" ;
> string i:long_name = "i index" ;
> i:_Storage = "contiguous" ;
> float j(j) ;
> string j:units = "none" ;
> string j:long_name = "j index" ;
> j:_Storage = "contiguous" ;
> float tas(time, i, j) ;
> string tas:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/
> CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_IPSL-CM5A-LR_rcp85_r0i0p0.nc
> areacella: areacella_fx_IPSL-CM5A-LR_rcp85_r0i0p0.nc" ;
> string tas:coordinates = "height" ;
> string tas:history = "2011-08-16T22:13:26Z altered by CMOR: Treated scalar
> dimension: \'height\'. 2011-08-16T22:13:26Z altered by CMOR: replaced
> missing value flag (9.96921e+36) with standard missing value (1e+20).
> 2011-08-16T22:13:45Z altered by CMOR: Inverted axis: lat." ;
> string tas:cell_measures = "area: areacella" ;
> string tas:cell_methods = "time: mean (interval: 30 minutes)" ;
> string tas:original_name = "t2m" ;
> string tas:units = "K" ;
> string tas:long_name = "Near-Surface Air Temperature" ;
> string tas:standard_name = "air_temperature" ;
> string tas:remap = "remapped via ESMF_regrid_with_weights: Bilinear" ;
> tas:missing_value = 1.e+20f ;
> tas:_FillValue = 1.e+20f ;
> tas:_Storage = "contiguous" ;
> float lat(j, i) ;
> string lat:long_name = "latitude" ;
> string lat:units = "degrees_north" ;
> lat:_FillValue = -999.f ;
> lat:_Storage = "contiguous" ;
> float lon(j, i) ;
> string lon:long_name = "longitude" ;
> string lon:units = "degrees_east" ;
> lon:_FillValue = -999.f ;
> lon:_Storage = "contiguous" ;
>
> // global attributes:
> :_Format = "netCDF-4” ;
>
>
> }
>
> On Dec 18, 2017, at 4:05 PM, Dave Allured - NOAA Affiliate <
> dave.allured at noaa.gov> wrote:
>
> Tomoko,
>
> Please add the "s" flag to Dennis's request. This will show chunk size
> parameters that may be relevant:
>
> %> ncdump -hs tas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc
> <http://secure-web.cisco.com/1RvOpL6YjTvAr2KOsZZoipbunMq6bhiktJdhz1n0TOzrW7Kca4nc59oM2g-2N-kBfsTvJq7EbmlS0Z0eO4xIOUUGW_31tXoipuew_3rVpiohC-518ODXnyFmFzR3smzNsHtNpBxiusoXkUpFilSVSwqsMZkdC-XQlFW8u11QwrHot_ne7RdeFPhoD5wROJB6k9CdtgjpxTb-_aez0Tad7dXeTlQS3e03cvhtyTppAhrh3JfDiMZ-wPPL3IxrC8RKfrObsJ1IJzS-axy1820aJai6zHy_wSWi3R-c1HLtu4HAI5n3tr2EPhcRoJP3pwWV5buWNtzKmLv8gwqUjknsHNOSZ3JQcuEjcH701UsyOhZQ6x3bS5Gim_Mb6Ez5ypNgivXHE-Baroa1_LD5JlLe0hvRxMkNAYoIm6NACLkfGmDh7xi3ImE5TglAUjnPxz6te/http%3A%2F%2Ftas_day_ipsl-cm5a-lr_rcp85_r1i1p1_20060101-22051231.rgrd.nc%2F>
>
> There are known problems with Netcdf-4 chunk sizes that can dramatically
> slow down reading. In particular, chunk size exceeding chunk cache size,
> or chunk size exceeding available memory. If we know the structure
> details, we may be able to suggest a solution.
>
> I recommend chunk sizes in the approximate range of 100 Kbytes to 4 Mbytes
> for large netcdf-4 files.
>
> --Dave
>
>
> On Mon, Dec 18, 2017 at 8:05 AM, Dennis Shea <shea at ucar.edu> wrote:
>
>> When you have a file issue, you should include some information:
>>
>> (a) what what either of the fillowing show?
>>
>> %> ncl_filedump tas_day_IPSL-CM5A-LR_rcp85_r1i
>> 1p1_20060101-22051231.rgrd.nc
>> <http://secure-web.cisco.com/1OW_a3n1krqUftMdpsEFQLF9MnaCbIIUe4GLVf5ETSY_RBlx7nPiMsNeW7GTg_jFwVFYKnsxK2L_qXy_j6XTtzdM3YZ8QjDyik2JwmTcJQg3siC3KZrDZA2cO2Gv32zlVRdcGx4_lj1NbqlqF1YPNXoYRcBDbT47dqqLWWXTS21VFlWnmXqK7DraSHCwEtg6JBYE_A21VnfoUm9oWJiNW7wrZA5H6ruPd-80HJdDBmfejeC7DmM__iHAL3xtTwdevGM47ItGs9VoGK452GL3K5HZzOm00XzEB9aTMT75DkJ9LWEc6vFeibxgel2GRxCa4rIbcMJc72l-aWsXOxlsterNiDlaTwTO5UCTcAqGLgirzEPEIzEl3bmiJeKRefywuAz1IBVNpmwInKYF-i3vwAjBNmWMoleFEeRYXQo8RkYs7665o-yr9eBchBgNzFXkn/http%3A%2F%2Ftas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc>
>>
>> or
>>
>> %> ncdump -h tas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc
>> <http://secure-web.cisco.com/1OW_a3n1krqUftMdpsEFQLF9MnaCbIIUe4GLVf5ETSY_RBlx7nPiMsNeW7GTg_jFwVFYKnsxK2L_qXy_j6XTtzdM3YZ8QjDyik2JwmTcJQg3siC3KZrDZA2cO2Gv32zlVRdcGx4_lj1NbqlqF1YPNXoYRcBDbT47dqqLWWXTS21VFlWnmXqK7DraSHCwEtg6JBYE_A21VnfoUm9oWJiNW7wrZA5H6ruPd-80HJdDBmfejeC7DmM__iHAL3xtTwdevGM47ItGs9VoGK452GL3K5HZzOm00XzEB9aTMT75DkJ9LWEc6vFeibxgel2GRxCa4rIbcMJc72l-aWsXOxlsterNiDlaTwTO5UCTcAqGLgirzEPEIzEl3bmiJeKRefywuAz1IBVNpmwInKYF-i3vwAjBNmWMoleFEeRYXQo8RkYs7665o-yr9eBchBgNzFXkn/http%3A%2F%2Ftas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc>
>>
>> (b) what version of NCL are you using>
>>
>> %> ncl -V
>>
>> (c) your system info
>>
>> %> uname -a
>>
>> -----------
>>
>> fdir=“/root/dir4ncl/“
>> fili="tas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc
>> <http://secure-web.cisco.com/1OW_a3n1krqUftMdpsEFQLF9MnaCbIIUe4GLVf5ETSY_RBlx7nPiMsNeW7GTg_jFwVFYKnsxK2L_qXy_j6XTtzdM3YZ8QjDyik2JwmTcJQg3siC3KZrDZA2cO2Gv32zlVRdcGx4_lj1NbqlqF1YPNXoYRcBDbT47dqqLWWXTS21VFlWnmXqK7DraSHCwEtg6JBYE_A21VnfoUm9oWJiNW7wrZA5H6ruPd-80HJdDBmfejeC7DmM__iHAL3xtTwdevGM47ItGs9VoGK452GL3K5HZzOm00XzEB9aTMT75DkJ9LWEc6vFeibxgel2GRxCa4rIbcMJc72l-aWsXOxlsterNiDlaTwTO5UCTcAqGLgirzEPEIzEl3bmiJeKRefywuAz1IBVNpmwInKYF-i3vwAjBNmWMoleFEeRYXQo8RkYs7665o-yr9eBchBgNzFXkn/http%3A%2F%2Ftas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc>
>> "
>> fn=addfile(fdir+fili,"r")
>> buff=fn->tas
>> print(“Data is stored”)
>>
>> On Mon, Dec 18, 2017 at 3:35 AM, Guido Cioni <guidocioni at gmail.com>
>> wrote:
>>
>>> Tomoko,
>>> 9 GB is anything but "large", although the concept of "large" is highly
>>> subjective :P
>>>
>>> I've successfully read SINGLE netcdf files in NCL whose size was ~500GB
>>> so that shouldn't be the problem. For some reason a netcdf file of some
>>> size, say 400 GB, which has many timesteps is read more slowly than a file
>>> with the same size but with less timesteps; that was my impression.
>>>
>>> You are setting a lot of options which I think are not needed. Did you
>>> just try to read the file with this line?
>>>
>>> fn=addfile(fdir+fili,"r")
>>>
>>>
>>> If it still takes a lot of time it could be system-dependent. When
>>> creating the variable NCL stores it into the RAM. If the system does not
>>> have enough RAM, some virtual memory will be created on your hard drive,
>>> which can slow down everything. But honestly I don't think you're even
>>> close to saturate your system's RAM. The problem may lie somewhere else...
>>>
>>> Let us know.
>>>
>>> On 18. Dec 2017, at 07:17, Tomoko Koyama <Tomoko.Koyama at Colorado.EDU>
>>> wrote:
>>>
>>> I’d like to extract some data from a large netcdf file, which size is
>>> about 9GB.
>>>
>>> The following shows the partial script, but a submitted job was killed
>>> before “Data is stored” message appeared.
>>> (I attempted several times with 3-hr max walltime )
>>>
>>> setfileoption("nc", "FileStructure", "Advanced")
>>> fdir=“/root/dir4ncl/“
>>> fili="tas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc
>>> <http://secure-web.cisco.com/1OW_a3n1krqUftMdpsEFQLF9MnaCbIIUe4GLVf5ETSY_RBlx7nPiMsNeW7GTg_jFwVFYKnsxK2L_qXy_j6XTtzdM3YZ8QjDyik2JwmTcJQg3siC3KZrDZA2cO2Gv32zlVRdcGx4_lj1NbqlqF1YPNXoYRcBDbT47dqqLWWXTS21VFlWnmXqK7DraSHCwEtg6JBYE_A21VnfoUm9oWJiNW7wrZA5H6ruPd-80HJdDBmfejeC7DmM__iHAL3xtTwdevGM47ItGs9VoGK452GL3K5HZzOm00XzEB9aTMT75DkJ9LWEc6vFeibxgel2GRxCa4rIbcMJc72l-aWsXOxlsterNiDlaTwTO5UCTcAqGLgirzEPEIzEl3bmiJeKRefywuAz1IBVNpmwInKYF-i3vwAjBNmWMoleFEeRYXQo8RkYs7665o-yr9eBchBgNzFXkn/http%3A%2F%2Ftas_day_IPSL-CM5A-LR_rcp85_r1i1p1_20060101-22051231.rgrd.nc>
>>> "
>>> fn=addfile(fdir+fili,"r")
>>> setfileoption("nc","Format","NetCDF4Classic")
>>> buff=fn->tas
>>> print(“Data is stored”)
>>>
>>> Does it simply take a long time to read?
>>> Is there anyway to speed up to read a large netcdf file?
>>>
>>>
>>> Thank you,
>>> Tomoko
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20171218/bf7bbd68/attachment.html>
More information about the ncl-talk
mailing list