[ncl-talk] Handling large data files

Wed Aug 28 10:50:36 MDT 2019

[1] Not many servers allow or have available 80+GB of memory. You are
lucky!  :-)

[2]
You are using NCL's 'simple but, not necessarily, efficient' method to
create netCDF.

*https://www.ncl.ucar.edu/Applications/write_netcdf.shtml*
<https://www.ncl.ucar.edu/Applications/write_netcdf.shtml>

This method takes advantage of the fact that NCL variable structures
[objects] are based upon the netCDF variable model. Under the hood, NCL
makes the appropriate calls to the C netCDF-library functions. For "small"
files even with many variables, the method is 'efficient-enough'. The
simplicity is VERY NICE. I would speculate the 95+% of netCDF are written
via this method.

Still, because of the way netCDF files are structured there is a 'lot' of
overhead runder-the-hood. Your data has 'only'  8 variables but they are
LARGE. I am not sure how this affects memory usage.

[3]
My suggestion would be to use the netCDF operators [NCO] or Climate Data
Operators [CDO].

NCO:
%> *ncrcat* -v dbz --file_format netcdf4
/scratch/groups/oneillm/Nature_Runs/HNR1/domain4/RI_subsets/subset_*.nc"
DBZ.nc

Perhaps, after all single variable files have been created:

%>* ncks* DBZ.nc Q.nc ... MERGE.nc

Using the NCO or CDO operators is often recommended for these types of
tasks. The are tuned to perform these tasks while NCL is a general purpose
language.

Hope this helps

On Tue, Aug 27, 2019 at 7:04 PM Prashanth Bhalachandran via ncl-talk <
ncl-talk at ucar.edu> wrote:

> Dear NCL-ers,
> I have a question regarding handling large amounts of data in NCL.
>
> Currently, I have 41 Netcdf files that are each about 4.5 GBs. Each file
> contains eight variables that are
> [Time | 10] x [bottom_top | 60] x [south_north | 480] x [west_east_stag |
> 481]
>
> So far, I have managed to handle each of these files separately, but now I
> need to append them. I am using the following code but the data size is so
> large that the program is getting constantly killed even on servers (I
> tried requesting around 80 GB of memory). Is there a more efficient way to
> concatenate these files around the time dimension? I am attaching the code
> that I currently use below. To make the matters worse, I realized that some
> of my files are corrupt and need to be regenerated. However, I am unable to
> weed out which ones are corrupt without the below program running fully
> (that is currently difficult due to memory issues).
>
> Can anyone help me out here, please?
>
> Sai
>
>
> all_files = systemfunc("ls
> /scratch/groups/oneillm/Nature_Runs/HNR1/domain4/RI_subsets/subset_*.nc")
>         setfileoption("nc","Format","NetCDF4")
>         fall=addfiles(all_files, "r")
>         ListSetType(fall, "cat")
>
>
>         u =fall[:]->u
>         v =fall[:]->v
>         dbz =fall[:]->dbz
>         rho =fall[:]->rho
>         w =fall[:]->w
>         t =fall[:]->t
>         q =fall[:]->q
>         p =fall[:]->p
>
> ;=====================================================================
>       ; output variables directly
>
>          filo = "RI_subset_cat.nc"
>          system("rm -f filo")
>          setfileoption("nc","Format","NetCDF4")
>          ncdf = addfile(filo,"c")
>
>       ; make time an UNLIMITED dimension
>         filedimdef(ncdf,"time",-1,True)
>
>          ncdf->u  = u
>          ncdf->v  = v
>          ncdf->w  = w
>          ncdf->t  = t
>          ncdf->q  = q
>          ncdf->p  = p
>          ncdf->dbz= dbz
>          ncdf->rho= rho
>
>
>
>
> Dr. Saiprasanth Bhalachandran (Sai)
> Department of Earth System Science,
> Stanford University
> Website: https://sites.google.com/view/saiprasanth/
>
>
>
>
> _______________________________________________
> ncl-talk mailing list
> ncl-talk at ucar.edu
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20190828/a278adb3/attachment.html>