[ncl-talk] Handling large data files
Dennis Shea
shea at ucar.edu
Wed Aug 28 10:50:36 MDT 2019
[1] Not many servers allow or have available 80+GB of memory. You are
lucky! :-)
[2]
You are using NCL's 'simple but, not necessarily, efficient' method to
create netCDF.
*https://www.ncl.ucar.edu/Applications/write_netcdf.shtml*
<https://www.ncl.ucar.edu/Applications/write_netcdf.shtml>
This method takes advantage of the fact that NCL variable structures
[objects] are based upon the netCDF variable model. Under the hood, NCL
makes the appropriate calls to the C netCDF-library functions. For "small"
files even with many variables, the method is 'efficient-enough'. The
simplicity is VERY NICE. I would speculate the 95+% of netCDF are written
via this method.
Still, because of the way netCDF files are structured there is a 'lot' of
overhead runder-the-hood. Your data has 'only' 8 variables but they are
LARGE. I am not sure how this affects memory usage.
[3]
My suggestion would be to use the netCDF operators [NCO] or Climate Data
Operators [CDO].
NCO:
%> *ncrcat* -v dbz --file_format netcdf4
/scratch/groups/oneillm/Nature_Runs/HNR1/domain4/RI_subsets/subset_*.nc"
DBZ.nc
Perhaps, after all single variable files have been created:
%>* ncks* DBZ.nc Q.nc ... MERGE.nc
Using the NCO or CDO operators is often recommended for these types of
tasks. The are tuned to perform these tasks while NCL is a general purpose
language.
Hope this helps
On Tue, Aug 27, 2019 at 7:04 PM Prashanth Bhalachandran via ncl-talk <
ncl-talk at ucar.edu> wrote:
> Dear NCL-ers,
> I have a question regarding handling large amounts of data in NCL.
>
> Currently, I have 41 Netcdf files that are each about 4.5 GBs. Each file
> contains eight variables that are
> [Time | 10] x [bottom_top | 60] x [south_north | 480] x [west_east_stag |
> 481]
>
> So far, I have managed to handle each of these files separately, but now I
> need to append them. I am using the following code but the data size is so
> large that the program is getting constantly killed even on servers (I
> tried requesting around 80 GB of memory). Is there a more efficient way to
> concatenate these files around the time dimension? I am attaching the code
> that I currently use below. To make the matters worse, I realized that some
> of my files are corrupt and need to be regenerated. However, I am unable to
> weed out which ones are corrupt without the below program running fully
> (that is currently difficult due to memory issues).
>
> Can anyone help me out here, please?
>
> Sai
>
>
> all_files = systemfunc("ls
> /scratch/groups/oneillm/Nature_Runs/HNR1/domain4/RI_subsets/subset_*.nc")
> setfileoption("nc","Format","NetCDF4")
> fall=addfiles(all_files, "r")
> ListSetType(fall, "cat")
>
>
> u =fall[:]->u
> v =fall[:]->v
> dbz =fall[:]->dbz
> rho =fall[:]->rho
> w =fall[:]->w
> t =fall[:]->t
> q =fall[:]->q
> p =fall[:]->p
>
> ;=====================================================================
> ; output variables directly
>
> filo = "RI_subset_cat.nc"
> system("rm -f filo")
> setfileoption("nc","Format","NetCDF4")
> ncdf = addfile(filo,"c")
>
> ; make time an UNLIMITED dimension
> filedimdef(ncdf,"time",-1,True)
>
> ncdf->u = u
> ncdf->v = v
> ncdf->w = w
> ncdf->t = t
> ncdf->q = q
> ncdf->p = p
> ncdf->dbz= dbz
> ncdf->rho= rho
>
>
>
>
> Dr. Saiprasanth Bhalachandran (Sai)
> Department of Earth System Science,
> Stanford University
> Website: https://sites.google.com/view/saiprasanth/
>
>
>
>
> _______________________________________________
> ncl-talk mailing list
> ncl-talk at ucar.edu
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20190828/a278adb3/attachment.html>
More information about the ncl-talk
mailing list