[Wrf-users] Parallel NETCDF4?

Wei-keng Liao wkliao at eecs.northwestern.edu
Sat Mar 5 21:09:05 MST 2016


Hi, Chris,

I learned that WRF 3.7 can use PIO library and PIO has an option to
do parallel I/O through netCDF4. https://github.com/NCAR/ParallelIO
(However, I have never given it a try.) I believe someone on this list
will provide you the configure instructions.

Could you check the compression ratios in your netCDF-4 files?
The command to show is "h5dump -p -H filename |grep COMPRESSION"
I am interested in knowing the compression ratios.

Since you are using Lustre, PnetCDF enables an internal feature that
aligns the starting file offsets of all fix-sized variables to the
file system striping boundaries, which can add gaps between any
two consecutive variables in the file. If your file contains many
variables, then this alignment gaps may be the cause of large file
size in your case. One way to disable it is to set the following
environment variable before the run.
    export PNETCDF_HINTS="nc_var_align_size=1"
I am a PnetCDF developer. Your case can be important for me to tune
the future PnetCDF design. Thanks.

Wei-keng

On Mar 5, 2016, at 8:04 PM, Christopher Thomas wrote:

> Thanks for your replies Dom and Wei-keng. 
> 
> Yes, I gather that the pnetCDF library cannot produce NETCDF4 files, but I have also read that the NETCD4 library can do parallel input and output http://www.unidata.ucar.edu/software/netcdf/docs/parallel_io.html. So I guess what I was asking is: does anyone know how to get WRF to write in parallel to a NETCDF4 file? I am guessing that the answer is no, or someone would have mentioned it, but if it were possible I think that implementing it would be a very worthwhile project. 
> 
> Yes, I can post-process the data to do the compression later, but the data volumes are very high (~1.2 TB per hour simulation time) so this sort of post-processing is a significant overhead. 
> 
> Thanks for the suggestion of I/O quilting. I'll give it a try. 
> 
> To answer you questions WEi-keng: I am using a lustre file system, and two versions of WRF:  3.7 compiled with pnetCDF, and 3.7.1 compiled with serial I/O. 
> 
> Regards
> 
> Chris 
> 
> On Sat, Mar 5, 2016 at 11:16 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
> Hi, Chris
> 
> The file size difference may be due to data compression when WRF uses HDF5
> library internally to write NetCDF4 files. PnetCDF does not do compression.
> You can use "ncgen" to convert a netCDF classical file to netCDF conformed
> HDF5 files. See netCDF user guide for command-line options of ncgen.
> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf_utilities_guide.html#guide_ncgen
> 
> As for the files produced by PnetCDF, would you mind answering my
> 2 questions. Were you writing the outputs to a Lustre file system,
> or the name of parallel file system you used if not Lustre?
> What version of WRF are you using?
> 
> Wei-keng
> 
> On Mar 4, 2016, at 3:54 AM, Christopher Thomas wrote:
> 
> > Hi there,
> >
> > Does anyone know how to compile WRF to use parallel NETCDF but still output NETCDF4 files?
> >
> > I am using 512 cores and outputting data at high temporal and spatial resolutions; parallel NETCDF reduces my wall times by about 1/3 but more than doubles output file sizes over NETCDF4 serial output.
> >
> > Chris
> > _______________________________________________
> > Wrf-users mailing list
> > Wrf-users at ucar.edu
> > http://mailman.ucar.edu/mailman/listinfo/wrf-users
> 
> 



More information about the Wrf-users mailing list