[ncl-talk] I/O problem with regridding many nc files?

Mary Haley haley at ucar.edu
Tue Jan 19 10:02:37 MST 2016


Archana,

I've been working on this for awhile now, and have something that runs, but
is still not terribly fast. The simple fact is that you are trying to
regrid to a rather large variable, and across 24 time steps.

There are parts of your script that I'm puzzled by.  One is the fact that
your data is ordered lat x lon x time rather than time x lat x lon.  This
adds an unnecessary additional step of having to reorder your data, because
the regridding routine is expecting time x lat x lon. Second, I'm not sure
why you are defining "time" as an unlimited dimension on your file, but
then you have named what appears to be the time dimension as "ZDim" instead.

Also, it bears repeating that do loops are inefficient in scripting
languages like NCL, Python, and IDL, and you want to keep things out of a
do loop if possible. In your script, you are redefining the "T2_array" and
"SWDOWN_array" variables every time in the loop, which is expensive. You
are also regridding one time step at a time inside the loop, which means
you are opening and closing a rather large NetCDF file that contains the
weights file multiple times.  I understand that you may be doing this to
save memory, but I think you might be able to do this better by managing
your memory better.

Having said all this, I realize that ESMF_regrid_with_weights is part of
the problem, because it's the routine that is reading the necessary
variables off the NetCDF weights file in order to do the regridding.  Since
you had this inside the do loop, this means you are reading the same
variables off the file again and again.

To prevent having to open the weights file multiple times, I created a
function that reads the necessary data off the weights file and stores it
in a "list" variable. I then wrote a second function that uses this list to
apply the variables in a sparse matrix calculation for the regridding.
This sped things up and I am now able to get a regridded variable written
to a new NetCDF file.

Please try the attached script. You may need to modify the directory paths,
but otherwise it should be runnable as-is.

--Mary


On Tue, Jan 12, 2016 at 1:18 PM, Archana Dayalu <adayalu at seas.harvard.edu>
wrote:

> Sorry, a couple more things:
> 1) To answer your question, Mary: I am using the setfileoption command to
> change file type.
> 2) I just re-ran the whole script a different way (i.e., I broke it up by
> common days of a month rather than entire months ... so instead of 12
> independent loops of 28-31 iterations, I did 31 independent loops of 4-12
> iterations). So far it seems to be completing successfully which indicates
> that it was a problem with too many calls to the ESMF regrid with weights
> function? But I don't understand why that would be so (if it were the
> issue).
> Regards
> Archana
>
> On Tue, Jan 12, 2016 at 3:04 PM, Archana Dayalu <adayalu at seas.harvard.edu>
> wrote:
>
>> Hi Mary,
>> I just uploaded my code + data to the ftp site. The files I have uploaded
>> are:
>> 1) 31 files of type "aux2vprm_d01_2006-01-dd_hrlyavg.nc" (where dd=two
>> digit day in january), 2.5MB each
>> 2) 1 weights file "T2SWDOWN_to_MODIS_bilinear.faster.nc" (this is kind
>> of big ... ~6GB)
>> 3) 1 file with the script "regrid_using_weights_T2SWDOWN_MOD_noplots.ncl"
>> (edit the header of file to specify directories; all else should not
>> require editing).
>> Other requested information:
>> ncl -V = 6.2.1
>> uname -a = Linux rclogin11.rc.fas.harvard.edu 2.6.32-431.17.1.el6.x86_64
>> #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>> Thanks very much! Please let me know if you have any trouble with the
>> code/files.
>> Regards
>> Archana
>>
>> On Tue, Jan 12, 2016 at 11:29 AM, Mary Haley <haley at ucar.edu> wrote:
>>
>>> Are you by any chance using setfileoption to change the write type to
>>> "NetCDF4"?
>>>
>>> I'm wondering if there's an issue with going into NetCDF4 mode, that may
>>> cause a problem if you stay in that mode when reading your weights file.
>>>
>>> Can you provide us with the full code and all the files needed to run
>>> it? You can use our ftp if the files are not too large:
>>>
>>> http://www.ncl.ucar.edu/report_bug.shtml#HowToFTP
>>>
>>>
>>> --Mary
>>>
>>>
>>> On Mon, Jan 11, 2016 at 2:09 PM, Archana Dayalu <
>>> adayalu at seas.harvard.edu> wrote:
>>>
>>>> Hi there,
>>>> I am using the ESMF tools to regrid a bunch of wrf-grid files to
>>>> rectangular grid. I am running each month independently (running the script
>>>> in 12 independent batches, where each batch corresponds to a month) but the
>>>> days in a given month are done serially in that loop. My steps, as a quick
>>>> summary, are:
>>>> (1) generate the weights once, valid for all variables for all files
>>>> for the year. Then, for each of 365 daily temperature and radiation files:
>>>> (2) extract the daily T2 (Kelvin), SWDOWN variables which I previously
>>>> defined as a 3D array (XDim,YDim,23)
>>>> (3) apply the weights to hourly T2, SWDOWN output for a year. This step
>>>> involves a loop ... and the relevant part includes:
>>>> do hh=0,23
>>>>
>>>> T2_regrid       = ESMF_regrid_with_weights(T2,wgt_file_name,Opt)
>>>>
>>>> SWDOWN_regrid = ESMF_regrid_with_weights(SWDOWN,wgt_file_name,Opt)
>>>> T2_array(:,:,hh) = T2_regrid
>>>>
>>>> SWDOWN_array(:,:,hh) = SWDOWN_regrid
>>>>
>>>> ; I also have it print out min max for each variable for each hour
>>>> regridded
>>>>
>>>> end do
>>>>
>>>> (4) Write out the two regridded arrays to a daily netcdf file using
>>>> method 1, Netcdf4, with file compression level set to 4
>>>>
>>>> The problem: The script fails toward the end (e.g, around the 27th file
>>>> of a month) with the following error message and begins to input nonsense
>>>> Temperature and radiation values into the array and states the weights file
>>>> cannot be opened (it does exist). When I re-run the failed time-period as a
>>>> one-time regridding, it regrids just fine. Is ncl unhappy because of the
>>>> high I/O? It doesn't fail at the netcdf generation step ... it seems to
>>>> fail at the regridding step itself. Thanks for any help! The error output
>>>> follows ...
>>>>
>>>> Variable: hh
>>>> Type: integer
>>>> Total Size: 4 bytes
>>>>             1 values
>>>> Number of Dimensions: 1
>>>> Dimensions and sizes:   [1]
>>>> Coordinates:
>>>> (0)     15
>>>> fatal:["NclNewHDF5.c":3787]:NclNewHDF5.c: Cannot open file:
>>>> </n/regal/wofsy_lab/adayalu/T2SWDOWN_to_MODIS_bilinear.faster.nc>
>>>> fatal:["NclAdvancedFile.c":2667]:_NclFileFillHLFS: Could not open
>>>> (/n/regal/wofsy_lab/adayalu/T2SWDOWN_to_MODIS_bilinear.faster.nc)
>>>> (0)   ESMF_regrid_with_weights: cannot open weight file
>>>> '/n/regal/wofsy_lab/adayalu/T2SWDOWN_to_MODIS_bilinear.faster.nc'.
>>>> (0)
>>>> (0)     TEMP at 2 M: min=239.284   max=299.333
>>>> (0)
>>>> (0)     min=9.96921e+36   max=9.96921e+36
>>>> warning:VarVarWrite: rhs has no dimension name or coordinate variable,
>>>> deleting name of lhs dimension number (0) and destroying coordinate var,
>>>>  use "(/../)" if this is not desired outcome
>>>> warning:VarVarWrite: rhs has no dimension name or coordinate variable,
>>>> deleting name of lhs dimension number (1) and destroying coordinate var,
>>>>  use "(/../)" if this is not desired outcome
>>>> warning:["Execute.c":8578]:Execute: Error occurred at or near line 64
>>>> in file regrid_using_weights_T2SWDOWN_MOD_noplots.ncl
>>>>
>>>> HDF5-DIAG: Error detected in HDF5 (1.8.13) thread 0:
>>>>   #000: H5D.c line 461 in H5Dget_space(): not a dataset
>>>>     major: Invalid arguments to routine
>>>>     minor: Inappropriate type
>>>> (...then a series of similar error messages are output .... and then it
>>>> continues .... and nonsense is output):
>>>>
>>>> Variable: hh
>>>> Type: integer
>>>> Total Size: 4 bytes
>>>>             1 values
>>>> Number of Dimensions: 1
>>>> Dimensions and sizes:   [1]
>>>> Coordinates:
>>>> (0)     17
>>>> (0)
>>>> (0)     TEMP at 2 M: min=0.0114805   max=299.333
>>>> (0)
>>>> (0)     DOWNWARD SHORT WAVE FLUX AT GROUND SURFACE: min=0
>>>> max=2.90338e-31
>>>>
>>>>
>>>> _______________________________________________
>>>> ncl-talk mailing list
>>>> ncl-talk at ucar.edu
>>>> List instructions, subscriber options, unsubscribe:
>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>>
>>>>
>>>
>>
>>
>> --
>> ____________________________________________________
>> *Archana Dayalu*
>> Graduate Student
>> Dept. of Earth and Planetary Sciences
>> Harvard University
>> 24 Oxford Street #402
>> Cambridge, MA 02138
>> (617) 384-8206
>>
>
>
>
> --
> ____________________________________________________
> *Archana Dayalu*
> Graduate Student
> Dept. of Earth and Planetary Sciences
> Harvard University
> 24 Oxford Street #402
> Cambridge, MA 02138
> (617) 384-8206
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20160119/0a78ec89/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: regrid_using_weights_T2SWDOWN_MOD_noplots_mod.ncl
Type: application/octet-stream
Size: 7363 bytes
Desc: not available
Url : http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20160119/0a78ec89/attachment.obj 


More information about the ncl-talk mailing list