[ncl-talk] Reducing file size

Guido Cioni guidocioni at gmail.com
Thu Jan 18 08:09:52 MST 2018


Michael,
first of all I would strongly suggest you to do that outside of NCL.
If you try to read a 10 TB inside NCL and define some variables you will likely end up with a segmentation fault or extremely slow execution.
You likely want to use fast-processing techniques that make use of C++/Fortran architectures, like CDO or NCO. 

Regarding your task, however, it is hard to give you a suggestion. If the original file was created like that it means that all variables have the correct type/units/attributes (I guess). Thus, I don't know what you can achieve by changing type, if not a strong headache in trying to read and write again the file :) 

Any chance you have the original GRIB file? You can try to convert it to GRIB, which will save you at most 50% of the space, but it is tricky....

Anyway it would be nice to know what you want to achieve in the end. Do you need to process the data? Do a plot? You can still extract a single level, variable or even time step beforehand (CDO or NCO), and that will surely reduce the size and the time that you need to read it inside NCL. 

Cheers 

> On 18. Jan 2018, at 16:02, Michael Notaro via ncl-talk <ncl-talk at ucar.edu> wrote:
> 
> I have about 10 TB of regional climate model output SRF files
> that I need to reduce in size probably to 1 TB.  For example,
> one monthly file, as dumped below, IPSL_SRF.1998010100.nc, 
> is 2.3 GB.  Any recommendations in NCL on how
> to effectively accomplish this task? (e.g. command
> to convert the contents to short, or way to 
> compress the content, etc) 
> Thanks, Michael
> 
> 
> [notaro at petenwell ~/processing]# ls -l -h *nc
> -rw-r--r-- 1 notaro notaro 2.3G Jan 18 08:35 IPSL_SRF.1998010100.nc
> [notaro at petenwell ~/processing]# ncdump -h IPSL_SRF.1998010100.nc
> netcdf IPSL_SRF.1998010100 {
> dimensions:
> jx = 217 ;
> iy = 141 ;
> kz = 28 ;
> time = UNLIMITED ; // (744 currently)
> m10 = 1 ;
> m2 = 1 ;
> soil_layer = 2 ;
> time_bounds = 2 ;
> variables:
> float jx(jx) ;
> jx:long_name = "x-coordinate in Cartesian system" ;
> jx:standard_name = "projection_x_coordinate" ;
> jx:units = "m" ;
> jx:axis = "X" ;
> jx:_CoordinateAxisType = "GeoX" ;
> float iy(iy) ;
> iy:long_name = "y-coordinate in Cartesian system" ;
> iy:standard_name = "projection_y_coordinate" ;
> iy:units = "m" ;
> iy:axis = "Y" ;
> iy:_CoordinateAxisType = "GeoY" ;
> float sigma(kz) ;
> sigma:long_name = "Sigma at half model layers" ;
> sigma:standard_name = "atmosphere_sigma_coordinate" ;
> sigma:units = "1" ;
> sigma:axis = "Z" ;
> sigma:positive = "down" ;
> sigma:formula_terms = "sigma: sigma ps: ps ptop: ptop" ;
> sigma:_CoordinateAxisType = "GeoZ" ;
> float ptop ;
> ptop:long_name = "Pressure at model top" ;
> ptop:standard_name = "air_pressure" ;
> ptop:units = "hPa" ;
> float xlon(iy, jx) ;
> xlon:long_name = "Longitude on Cross Points" ;
> xlon:standard_name = "longitude" ;
> xlon:units = "degrees_east" ;
> xlon:grid_mapping = "rcm_map" ;
> float xlat(iy, jx) ;
> xlat:long_name = "Latitude on Cross Points" ;
> xlat:standard_name = "latitude" ;
> xlat:units = "degrees_north" ;
> xlat:grid_mapping = "rcm_map" ;
> float mask(iy, jx) ;
> mask:long_name = "Land Mask" ;
> mask:standard_name = "land_binary_mask" ;
> mask:units = "1" ;
> mask:coordinates = "xlat xlon" ;
> mask:grid_mapping = "rcm_map" ;
> float topo(iy, jx) ;
> topo:long_name = "Surface Model Elevation" ;
> topo:standard_name = "surface_altitude" ;
> topo:units = "m" ;
> topo:coordinates = "xlat xlon" ;
> topo:grid_mapping = "rcm_map" ;
> float ps(time, iy, jx) ;
> ps:long_name = "Surface Pressure" ;
> ps:standard_name = "surface_air_pressure" ;
> ps:units = "hPa" ;
> ps:coordinates = "xlat xlon" ;
> ps:grid_mapping = "rcm_map" ;
> ps:cell_methods = "time: point" ;
> float drag(time, iy, jx) ;
> drag:long_name = "Surface drag stress coefficient in air" ;
> drag:standard_name = "surface_drag_coefficient_in_air" ;
> drag:units = "1" ;
> drag:coordinates = "xlat xlon" ;
> drag:grid_mapping = "rcm_map" ;
> drag:cell_methods = "time: point" ;
> float ts(time, iy, jx) ;
> ts:long_name = "Ground surface temperature" ;
> ts:standard_name = "surface_temperature" ;
> ts:units = "K" ;
> ts:coordinates = "xlat xlon" ;
> ts:grid_mapping = "rcm_map" ;
> ts:cell_methods = "time: point" ;
> float tf(time, iy, jx) ;
> tf:long_name = "Foliage canopy temperature" ;
> tf:standard_name = "canopy_temperature" ;
> tf:units = "K" ;
> tf:coordinates = "xlat xlon" ;
> tf:grid_mapping = "rcm_map" ;
> tf:cell_methods = "time: point" ;
> tf:_FillValue = 1.e+20f ;
> float pr(time, iy, jx) ;
> pr:long_name = "Total precipitation flux" ;
> pr:standard_name = "precipitation_flux" ;
> pr:units = "kg m-2 s-1" ;
> pr:coordinates = "xlat xlon" ;
> pr:grid_mapping = "rcm_map" ;
> pr:cell_methods = "time: mean" ;
> float evspsbl(time, iy, jx) ;
> evspsbl:long_name = "Total evapotranspiration flux" ;
> evspsbl:standard_name = "water_evaporation_flux" ;
> evspsbl:units = "kg m-2 s-1" ;
> evspsbl:coordinates = "xlat xlon" ;
> evspsbl:grid_mapping = "rcm_map" ;
> evspsbl:cell_methods = "time: mean" ;
> float snv(time, iy, jx) ;
> snv:long_name = "Liquid water equivalent of snow thickness" ;
> snv:standard_name = "lwe_thickness_of_surface_snow_amount" ;
> snv:units = "kg m-2" ;
> snv:coordinates = "xlat xlon" ;
> snv:grid_mapping = "rcm_map" ;
> snv:cell_methods = "time: mean" ;
> snv:_FillValue = 1.e+20f ;
> float hfss(time, iy, jx) ;
> hfss:long_name = "Sensible heat flux" ;
> hfss:standard_name = "surface_upward_sensible_heat_flux" ;
> hfss:units = "W m-2" ;
> hfss:coordinates = "xlat xlon" ;
> hfss:grid_mapping = "rcm_map" ;
> hfss:cell_methods = "time: mean" ;
> float rsnl(time, iy, jx) ;
> rsnl:long_name = "Net upward longwave energy flux" ;
> rsnl:standard_name = "net_upward_longwave_flux_in_air" ;
> rsnl:units = "W m-2" ;
> rsnl:coordinates = "xlat xlon" ;
> rsnl:grid_mapping = "rcm_map" ;
> rsnl:cell_methods = "time: mean" ;
> float rsns(time, iy, jx) ;
> rsns:long_name = "Net downward shortwave energy flux" ;
> rsns:standard_name = "net_downward_shortwave_flux_in_air" ;
> rsns:units = "W m-2" ;
> rsns:coordinates = "xlat xlon" ;
> rsns:grid_mapping = "rcm_map" ;
> rsns:cell_methods = "time: mean" ;
> float rsdl(time, iy, jx) ;
> rsdl:long_name = "Surface downward longwave flux in air" ;
> rsdl:standard_name = "surface_downwelling_longwave_flux_in_air" ;
> rsdl:units = "W m-2" ;
> rsdl:coordinates = "xlat xlon" ;
> rsdl:grid_mapping = "rcm_map" ;
> rsdl:cell_methods = "time: mean" ;
> float rsds(time, iy, jx) ;
> rsds:long_name = "Surface downward shortwave flux in air" ;
> rsds:standard_name = "surface_downwelling_shortwave_flux_in_air" ;
> rsds:units = "W m-2" ;
> rsds:coordinates = "xlat xlon" ;
> rsds:grid_mapping = "rcm_map" ;
> rsds:cell_methods = "time: mean" ;
> float prc(time, iy, jx) ;
> prc:long_name = "Convective precipitation flux" ;
> prc:standard_name = "convective_rainfall_flux" ;
> prc:units = "kg m-2 s-1" ;
> prc:coordinates = "xlat xlon" ;
> prc:grid_mapping = "rcm_map" ;
> prc:cell_methods = "time: mean" ;
> float zmla(time, iy, jx) ;
> zmla:long_name = "Atmospheric Boundary Layer thickness" ;
> zmla:standard_name = "atmosphere_boundary_layer_thickness" ;
> zmla:units = "m" ;
> zmla:coordinates = "xlat xlon" ;
> zmla:grid_mapping = "rcm_map" ;
> zmla:cell_methods = "time: point" ;
> float aldirs(time, iy, jx) ;
> aldirs:long_name = "Surface albedo to direct shortwave radiation" ;
> aldirs:standard_name = "surface_albedo_short_wave_direct" ;
> aldirs:units = "1" ;
> aldirs:coordinates = "xlat xlon" ;
> aldirs:grid_mapping = "rcm_map" ;
> aldirs:cell_methods = "time: point" ;
> float aldifs(time, iy, jx) ;
> aldifs:long_name = "Surface albedo to diffuse shortwave radiation" ;
> aldifs:standard_name = "surface_albedo_short_wave_diffuse" ;
> aldifs:units = "1" ;
> aldifs:coordinates = "xlat xlon" ;
> aldifs:grid_mapping = "rcm_map" ;
> aldifs:cell_methods = "time: point" ;
> float sund(time, iy, jx) ;
> sund:long_name = "Duration of sunshine" ;
> sund:standard_name = "duration_of_sunshine" ;
> sund:units = "s" ;
> sund:coordinates = "xlat xlon" ;
> sund:grid_mapping = "rcm_map" ;
> sund:cell_methods = "time: sum" ;
> float sndp(time, iy, jx) ;
> sndp:long_name = "Actual snow depth" ;
> sndp:standard_name = "snow_depth" ;
> sndp:units = "mm" ;
> sndp:coordinates = "xlat xlon" ;
> sndp:grid_mapping = "rcm_map" ;
> sndp:cell_methods = "time: mean" ;
> sndp:_FillValue = 1.e+20f ;
> float snfl(time, iy, jx) ;
> snfl:long_name = "Snowfall" ;
> snfl:standard_name = "snow_fall" ;
> snfl:units = "kg m-2 s-1" ;
> snfl:coordinates = "xlat xlon" ;
> snfl:grid_mapping = "rcm_map" ;
> snfl:cell_methods = "time: mean" ;
> float uas(time, m10, iy, jx) ;
> uas:long_name = "Anemometric zonal (westerly) wind component" ;
> uas:standard_name = "eastward_wind" ;
> uas:units = "m s-1" ;
> uas:coordinates = "xlat xlon" ;
> uas:grid_mapping = "rcm_map" ;
> uas:cell_methods = "time: point" ;
> float vas(time, m10, iy, jx) ;
> vas:long_name = "Anenometric meridional (southerly) wind component" ;
> vas:standard_name = "northward_wind" ;
> vas:units = "m s-1" ;
> vas:coordinates = "xlat xlon" ;
> vas:grid_mapping = "rcm_map" ;
> vas:cell_methods = "time: point" ;
> float tas(time, m2, iy, jx) ;
> tas:long_name = "Near surface air temperature" ;
> tas:standard_name = "air_temperature" ;
> tas:units = "K" ;
> tas:coordinates = "xlat xlon" ;
> tas:grid_mapping = "rcm_map" ;
> tas:cell_methods = "time: point" ;
> float qas(time, m2, iy, jx) ;
> qas:long_name = "Near surface air specific humidity" ;
> qas:standard_name = "specific_humidity" ;
> qas:units = "1" ;
> qas:coordinates = "xlat xlon" ;
> qas:grid_mapping = "rcm_map" ;
> qas:cell_methods = "time: point" ;
> float mrso(time, soil_layer, iy, jx) ;
> mrso:long_name = "Moisture content of the soil layers" ;
> mrso:standard_name = "soil_moisture_content_in_layers" ;
> mrso:units = "kg m-2" ;
> mrso:coordinates = "xlat xlon" ;
> mrso:grid_mapping = "rcm_map" ;
> mrso:cell_methods = "time: point" ;
> mrso:_FillValue = 1.e+20f ;
> float mrro(time, soil_layer, iy, jx) ;
> mrro:long_name = "Runoff flux" ;
> mrro:standard_name = "runoff_flux" ;
> mrro:units = "kg m-2 s-1" ;
> mrro:coordinates = "xlat xlon" ;
> mrro:grid_mapping = "rcm_map" ;
> mrro:cell_methods = "time: mean" ;
> mrro:_FillValue = 1.e+20f ;
> float time(time) ;
> time:long_name = "time" ;
> time:standard_name = "time" ;
> time:units = "hours since 1949-12-01 00:00:00 UTC" ;
> time:calendar = "noleap" ;
> time:bounds = "time_bnds" ;
> float time_bnds(time, time_bounds) ;
> time_bnds:units = "hours since 1949-12-01 00:00:00 UTC" ;
> time_bnds:calendar = "noleap" ;
> char rcm_map ;
> rcm_map:grid_mapping_name = "lambert_conformal_conic" ;
> rcm_map:standard_parallel = 36., 52. ;
> rcm_map:longitude_of_central_meridian = -97. ;
> rcm_map:latitude_of_projection_origin = 45. ;
> rcm_map:_CoordinateTransformType = "Projection" ;
> rcm_map:_CoordinateAxisTypes = "GeoX GeoY" ;
> 
> // global attributes:
> :title = "ICTP Regional Climatic model V4" ;
> :institution = "ICTP" ;
> :source = "RegCM Model output file" ;
> :Conventions = "CF-1.4" ;
> :references = "http://gforge.ictp.it/gf/project/regcm <http://gforge.ictp.it/gf/project/regcm>" ;
> :model_revision = "tag 4.3.5.6" ;
> :history = "2015-01-25 04:40:03 : Created by RegCM RegCM Model program" ;
> :experiment = "IPSL" ;
> :projection = "LAMCON" ;
> :grid_size_in_meters = 25000. ;
> :latitude_of_projection_origin = 45. ;
> :longitude_of_projection_origin = -97. ;
> :standard_parallel = 36., 52. ;
> :grid_factor = 0.696943758331507 ;
> :boundary_nspgx = 15 ;
> :boundary_nspgd = 15 ;
> :boundary_high_nudge = 3. ;
> :boundary_medium_nudge = 2. ;
> :boundary_low_nudge = 1. ;
> :model_is_restarted = "Yes" ;
> :model_simulation_initial_start = "1977-06-01 00:00:00 UTC" ;
> :model_simulation_start = "1998-01-01 00:00:00 UTC" ;
> :model_simulation_end = "1999-01-01 00:00:00 UTC" ;
> :atmosphere_time_step_in_seconds = 120. ;
> :surface_interaction_time_step_in_seconds = 120. ;
> :radiation_scheme_time_step_in_minuts = 30. ;
> :absorption_emission_time_step_in_hours = 18. ;
> :lateral_boundary_condition_scheme = 1 ;
> :boundary_layer_scheme = 1 ;
> :cumulus_convection_scheme = 2 ;
> :grell_scheme_closure = 2 ;
> :moisture_scheme = 1 ;
> :ocean_flux_scheme = 2 ;
> :zeng_ocean_roughness_formula = 1 ;
> :pressure_gradient_scheme = 0 ;
> :surface_emissivity_factor_computed = 0 ;
> :lake_model_activated = 1 ;
> :chemical_aerosol_scheme_activated = 0 ;
> :ipcc_scenario_code = "RF" ;
> :diurnal_cycle_sst_scheme = 0 ;
> :simple_sea_ice_scheme = 0 ;
> :seasonal_desert_albedo = 1 ;
> :convective_lwp_as_large_scale = 1 ;
> :rrtm_radiation_scheme_activated = 0 ;
> :climatic_ozone_input_dataset = 0 ;
> :static_solar_constant_used = 1 ;
> :subex_bottom_level_with_no_clouds = 1 ;
> :subex_maximum_cloud_fraction_cover = 0.8 ;
> :subex_auto_conversion_rate_for_land = 0.00025 ;
> :subex_auto_conversion_rate_for_ocean = 0.00025 ;
> :subex_gultepe_factor_when_rain_for_land = 0.4 ;
> :subex_gultepe_factor_when_rain_for_ocean = 0.4 ;
> :subex_rh_with_fcc_one = 1.01 ;
> :subex_rh_threshold_for_land = 0.8 ;
> :subex_rh_threshold_for_ocean = 0.9 ;
> :subex_limit_temperature = 238. ;
> :subex_raindrop_evaporation_rate = 0.0008 ;
> :subex_raindrop_accretion_rate = 3. ;
> :subex_cloud_fraction_maximum = 0.75 ;
> :subex_cloud_fraction_max_for_convection = 0.25 ;
> :subex_cloud_liqwat_max_for_convection = 5.e-05 ;
> :grell_min_shear_on_precip = 0.25 ;
> :grell_max_shear_on_precip = 0.5 ;
> :grell_min_precip_efficiency = 0.25 ;
> :grell_max_precip_efficiency = 0.5 ;
> :grell_min_precip_efficiency_o = 0.25 ;
> :grell_max_precip_efficiency_o = 0.5 ;
> :grell_min_precip_efficiency_x = 0.25 ;
> :grell_max_precip_efficiency_x = 0.5 ;
> :grell_min_shear_on_precip_on_ocean = 0.25 ;
> :grell_max_shear_on_precip_on_ocean = 0.5 ;
> :grell_min_precip_efficiency_on_ocean = 0.25 ;
> :grell_max_precip_efficiency_on_ocean = 0.5 ;
> :grell_min_precip_efficiency_o_on_ocean = 0.25 ;
> :grell_max_precip_efficiency_o_on_ocean = 0.5 ;
> :grell_min_precip_efficiency_x_on_ocean = 0.25 ;
> :grell_max_precip_efficiency_x_on_ocean = 0.5 ;
> :grell_max_depth_of_stable_layer = 150. ;
> :grell_min_depth_of_cloud = 150. ;
> :grell_min_convective_heating = -250. ;
> :grell_max_convective_heating = 500. ;
> :grell_max_cloud_base_height = 0.4 ;
> :grell_FC_ABE_removal_timescale = 30. ;
> :holtslag_critical_ocean_richardson = 0.25 ;
> :holtslag_critical_land_richardson = 0.25 ;
> }
> 
> 
> Michael Notaro
> Associate Director
> Nelson Institute Center for Climatic Research
> University of Wisconsin-Madison
> Phone: (608) 261-1503
> Email: mnotaro at wisc.edu <mailto:mnotaro at wisc.edu>
> _______________________________________________
> ncl-talk mailing list
> ncl-talk at ucar.edu <mailto:ncl-talk at ucar.edu>
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk <http://mailman.ucar.edu/mailman/listinfo/ncl-talk>

Guido Cioni
http://guidocioni.altervista <http://guidocioni.altervista/>.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20180118/18da18c7/attachment.html>


More information about the ncl-talk mailing list