[ncl-talk] I/O problem with regridding many nc files?

Archana Dayalu adayalu at seas.harvard.edu
Thu Feb 25 16:42:36 MST 2016


Hi Mary,
Referring back to the regridding of temperature and shortwave radiation
from a coarse grid to a very fine grid: the issue I'm running into now with
this faster and more efficient process is apparently ESMF doesn't appear to
automatically regrid across all dimensions ... it just repeats the regrid
values of the very first dimension it regridded and enters that value into
all the subsequent time dimensions of the 24hour x lat x lon.

For example, line 123 of the code:
SWDOWN_array = (/regrid_var(SWDOWN,rlist,(/nz,nlat,nlon/))/)
where
dimsizes(SWDOWN) = (0) 24 (1) 119 (2) 109
SWDOWN_array(0,:,:) = 110.3962
SWDOWN_array(1,:,:) = 110.3962 (and this actually continues into dim(0) =
23)

Instead, if I do the following two separate regrid instance across 1 time
dimension each to test:
SWDOWN_0 = (/regrid_var(SWDOWN(0,:,:),rlist,(/nlat,nlon/))/)
SWDOWN_1 = (/regrid_var(SWDOWN(1,:,:),rlist,(/nlat,nlon/))/)

And I get different values (the correct ones)
SWDOWN_0(4001,4001) = 110.3692
SWDOWN_1(4001,4001) = 291.0137

I'm not sure why ESMF is just repeating the first regrid dimensions into
all the remaining time dimensions of that nlat and nlon gridcell?
I uploaded the following to ftp.cgd.ucar.edu:
1)  aux2vprm_d01_2005-01-02_hrlyavg.nc (contains hourly averages of T2,
SWDOWN over a day for the specified date)
2)  regrid_using_weights_T2SWDOWN_MOD.ncl (your modified script to make the
regridding more efficient)
3)  T2SWDOWN_to_MODIS_bilinear_WGTS.nc (the weights file, rather large,
~7G).

Thanks for any help! Please let me know if you need further clarification.
Regards,
Archana

On Tue, Jan 19, 2016 at 12:02 PM, Mary Haley <haley at ucar.edu> wrote:

> Archana,
>
> I've been working on this for awhile now, and have something that runs,
> but is still not terribly fast. The simple fact is that you are trying to
> regrid to a rather large variable, and across 24 time steps.
>
> There are parts of your script that I'm puzzled by.  One is the fact that
> your data is ordered lat x lon x time rather than time x lat x lon.  This
> adds an unnecessary additional step of having to reorder your data, because
> the regridding routine is expecting time x lat x lon. Second, I'm not sure
> why you are defining "time" as an unlimited dimension on your file, but
> then you have named what appears to be the time dimension as "ZDim" instead.
>
> Also, it bears repeating that do loops are inefficient in scripting
> languages like NCL, Python, and IDL, and you want to keep things out of a
> do loop if possible. In your script, you are redefining the "T2_array" and
> "SWDOWN_array" variables every time in the loop, which is expensive. You
> are also regridding one time step at a time inside the loop, which means
> you are opening and closing a rather large NetCDF file that contains the
> weights file multiple times.  I understand that you may be doing this to
> save memory, but I think you might be able to do this better by managing
> your memory better.
>
> Having said all this, I realize that ESMF_regrid_with_weights is part of
> the problem, because it's the routine that is reading the necessary
> variables off the NetCDF weights file in order to do the regridding.  Since
> you had this inside the do loop, this means you are reading the same
> variables off the file again and again.
>
> To prevent having to open the weights file multiple times, I created a
> function that reads the necessary data off the weights file and stores it
> in a "list" variable. I then wrote a second function that uses this list to
> apply the variables in a sparse matrix calculation for the regridding.
> This sped things up and I am now able to get a regridded variable written
> to a new NetCDF file.
>
> Please try the attached script. You may need to modify the directory
> paths, but otherwise it should be runnable as-is.
>
> --Mary
>
>
> On Tue, Jan 12, 2016 at 1:18 PM, Archana Dayalu <adayalu at seas.harvard.edu>
> wrote:
>
>> Sorry, a couple more things:
>> 1) To answer your question, Mary: I am using the setfileoption command to
>> change file type.
>> 2) I just re-ran the whole script a different way (i.e., I broke it up by
>> common days of a month rather than entire months ... so instead of 12
>> independent loops of 28-31 iterations, I did 31 independent loops of 4-12
>> iterations). So far it seems to be completing successfully which indicates
>> that it was a problem with too many calls to the ESMF regrid with weights
>> function? But I don't understand why that would be so (if it were the
>> issue).
>> Regards
>> Archana
>>
>> On Tue, Jan 12, 2016 at 3:04 PM, Archana Dayalu <adayalu at seas.harvard.edu
>> > wrote:
>>
>>> Hi Mary,
>>> I just uploaded my code + data to the ftp site. The files I have
>>> uploaded are:
>>> 1) 31 files of type "aux2vprm_d01_2006-01-dd_hrlyavg.nc" (where dd=two
>>> digit day in january), 2.5MB each
>>> 2) 1 weights file "T2SWDOWN_to_MODIS_bilinear.faster.nc" (this is kind
>>> of big ... ~6GB)
>>> 3) 1 file with the script
>>> "regrid_using_weights_T2SWDOWN_MOD_noplots.ncl" (edit the header of file to
>>> specify directories; all else should not require editing).
>>> Other requested information:
>>> ncl -V = 6.2.1
>>> uname -a = Linux rclogin11.rc.fas.harvard.edu
>>> 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64
>>> x86_64 GNU/Linux
>>> Thanks very much! Please let me know if you have any trouble with the
>>> code/files.
>>> Regards
>>> Archana
>>>
>>> On Tue, Jan 12, 2016 at 11:29 AM, Mary Haley <haley at ucar.edu> wrote:
>>>
>>>> Are you by any chance using setfileoption to change the write type to
>>>> "NetCDF4"?
>>>>
>>>> I'm wondering if there's an issue with going into NetCDF4 mode, that
>>>> may cause a problem if you stay in that mode when reading your weights file.
>>>>
>>>> Can you provide us with the full code and all the files needed to run
>>>> it? You can use our ftp if the files are not too large:
>>>>
>>>> http://www.ncl.ucar.edu/report_bug.shtml#HowToFTP
>>>>
>>>>
>>>> --Mary
>>>>
>>>>
>>>> On Mon, Jan 11, 2016 at 2:09 PM, Archana Dayalu <
>>>> adayalu at seas.harvard.edu> wrote:
>>>>
>>>>> Hi there,
>>>>> I am using the ESMF tools to regrid a bunch of wrf-grid files to
>>>>> rectangular grid. I am running each month independently (running the script
>>>>> in 12 independent batches, where each batch corresponds to a month) but the
>>>>> days in a given month are done serially in that loop. My steps, as a quick
>>>>> summary, are:
>>>>> (1) generate the weights once, valid for all variables for all files
>>>>> for the year. Then, for each of 365 daily temperature and radiation files:
>>>>> (2) extract the daily T2 (Kelvin), SWDOWN variables which I previously
>>>>> defined as a 3D array (XDim,YDim,23)
>>>>> (3) apply the weights to hourly T2, SWDOWN output for a year. This
>>>>> step involves a loop ... and the relevant part includes:
>>>>> do hh=0,23
>>>>>
>>>>> T2_regrid       = ESMF_regrid_with_weights(T2,wgt_file_name,Opt)
>>>>>
>>>>> SWDOWN_regrid = ESMF_regrid_with_weights(SWDOWN,wgt_file_name,Opt)
>>>>> T2_array(:,:,hh) = T2_regrid
>>>>>
>>>>> SWDOWN_array(:,:,hh) = SWDOWN_regrid
>>>>>
>>>>> ; I also have it print out min max for each variable for each hour
>>>>> regridded
>>>>>
>>>>> end do
>>>>>
>>>>> (4) Write out the two regridded arrays to a daily netcdf file using
>>>>> method 1, Netcdf4, with file compression level set to 4
>>>>>
>>>>> The problem: The script fails toward the end (e.g, around the 27th
>>>>> file of a month) with the following error message and begins to input
>>>>> nonsense Temperature and radiation values into the array and states the
>>>>> weights file cannot be opened (it does exist). When I re-run the failed
>>>>> time-period as a one-time regridding, it regrids just fine. Is ncl unhappy
>>>>> because of the high I/O? It doesn't fail at the netcdf generation step ...
>>>>> it seems to fail at the regridding step itself. Thanks for any help! The
>>>>> error output follows ...
>>>>>
>>>>> Variable: hh
>>>>> Type: integer
>>>>> Total Size: 4 bytes
>>>>>             1 values
>>>>> Number of Dimensions: 1
>>>>> Dimensions and sizes:   [1]
>>>>> Coordinates:
>>>>> (0)     15
>>>>> fatal:["NclNewHDF5.c":3787]:NclNewHDF5.c: Cannot open file:
>>>>> </n/regal/wofsy_lab/adayalu/T2SWDOWN_to_MODIS_bilinear.faster.nc>
>>>>> fatal:["NclAdvancedFile.c":2667]:_NclFileFillHLFS: Could not open
>>>>> (/n/regal/wofsy_lab/adayalu/T2SWDOWN_to_MODIS_bilinear.faster.nc)
>>>>> (0)   ESMF_regrid_with_weights: cannot open weight file
>>>>> '/n/regal/wofsy_lab/adayalu/T2SWDOWN_to_MODIS_bilinear.faster.nc'.
>>>>> (0)
>>>>> (0)     TEMP at 2 M: min=239.284   max=299.333
>>>>> (0)
>>>>> (0)     min=9.96921e+36   max=9.96921e+36
>>>>> warning:VarVarWrite: rhs has no dimension name or coordinate variable,
>>>>> deleting name of lhs dimension number (0) and destroying coordinate var,
>>>>>  use "(/../)" if this is not desired outcome
>>>>> warning:VarVarWrite: rhs has no dimension name or coordinate variable,
>>>>> deleting name of lhs dimension number (1) and destroying coordinate var,
>>>>>  use "(/../)" if this is not desired outcome
>>>>> warning:["Execute.c":8578]:Execute: Error occurred at or near line 64
>>>>> in file regrid_using_weights_T2SWDOWN_MOD_noplots.ncl
>>>>>
>>>>> HDF5-DIAG: Error detected in HDF5 (1.8.13) thread 0:
>>>>>   #000: H5D.c line 461 in H5Dget_space(): not a dataset
>>>>>     major: Invalid arguments to routine
>>>>>     minor: Inappropriate type
>>>>> (...then a series of similar error messages are output .... and then
>>>>> it continues .... and nonsense is output):
>>>>>
>>>>> Variable: hh
>>>>> Type: integer
>>>>> Total Size: 4 bytes
>>>>>             1 values
>>>>> Number of Dimensions: 1
>>>>> Dimensions and sizes:   [1]
>>>>> Coordinates:
>>>>> (0)     17
>>>>> (0)
>>>>> (0)     TEMP at 2 M: min=0.0114805   max=299.333
>>>>> (0)
>>>>> (0)     DOWNWARD SHORT WAVE FLUX AT GROUND SURFACE: min=0
>>>>> max=2.90338e-31
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ncl-talk mailing list
>>>>> ncl-talk at ucar.edu
>>>>> List instructions, subscriber options, unsubscribe:
>>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> ____________________________________________________
>>> *Archana Dayalu*
>>> Graduate Student
>>> Dept. of Earth and Planetary Sciences
>>> Harvard University
>>> 24 Oxford Street #402
>>> Cambridge, MA 02138
>>> (617) 384-8206
>>>
>>
>>
>>
>> --
>> ____________________________________________________
>> *Archana Dayalu*
>> Graduate Student
>> Dept. of Earth and Planetary Sciences
>> Harvard University
>> 24 Oxford Street #402
>> Cambridge, MA 02138
>> (617) 384-8206
>>
>
>


-- 
____________________________________________________
*Archana Dayalu*
Graduate Student
Dept. of Earth and Planetary Sciences
Harvard University
24 Oxford Street #402
Cambridge, MA 02138
(617) 384-8206
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20160225/5d9dd67f/attachment.html 


More information about the ncl-talk mailing list