[ncl-talk] I/O problem with regridding many nc files?

Archana Dayalu adayalu at seas.harvard.edu
Tue Jan 19 10:26:21 MST 2016


This is great! Thank you so much for your help.
Regards
Archana

On Tue, Jan 19, 2016 at 12:02 PM, Mary Haley <haley at ucar.edu> wrote:

> Archana,
>
> I've been working on this for awhile now, and have something that runs,
> but is still not terribly fast. The simple fact is that you are trying to
> regrid to a rather large variable, and across 24 time steps.
>
> There are parts of your script that I'm puzzled by.  One is the fact that
> your data is ordered lat x lon x time rather than time x lat x lon.  This
> adds an unnecessary additional step of having to reorder your data, because
> the regridding routine is expecting time x lat x lon. Second, I'm not sure
> why you are defining "time" as an unlimited dimension on your file, but
> then you have named what appears to be the time dimension as "ZDim" instead.
>
> Also, it bears repeating that do loops are inefficient in scripting
> languages like NCL, Python, and IDL, and you want to keep things out of a
> do loop if possible. In your script, you are redefining the "T2_array" and
> "SWDOWN_array" variables every time in the loop, which is expensive. You
> are also regridding one time step at a time inside the loop, which means
> you are opening and closing a rather large NetCDF file that contains the
> weights file multiple times.  I understand that you may be doing this to
> save memory, but I think you might be able to do this better by managing
> your memory better.
>
> Having said all this, I realize that ESMF_regrid_with_weights is part of
> the problem, because it's the routine that is reading the necessary
> variables off the NetCDF weights file in order to do the regridding.  Since
> you had this inside the do loop, this means you are reading the same
> variables off the file again and again.
>
> To prevent having to open the weights file multiple times, I created a
> function that reads the necessary data off the weights file and stores it
> in a "list" variable. I then wrote a second function that uses this list to
> apply the variables in a sparse matrix calculation for the regridding.
> This sped things up and I am now able to get a regridded variable written
> to a new NetCDF file.
>
> Please try the attached script. You may need to modify the directory
> paths, but otherwise it should be runnable as-is.
>
> --Mary
>
>
> On Tue, Jan 12, 2016 at 1:18 PM, Archana Dayalu <adayalu at seas.harvard.edu>
> wrote:
>
>> Sorry, a couple more things:
>> 1) To answer your question, Mary: I am using the setfileoption command to
>> change file type.
>> 2) I just re-ran the whole script a different way (i.e., I broke it up by
>> common days of a month rather than entire months ... so instead of 12
>> independent loops of 28-31 iterations, I did 31 independent loops of 4-12
>> iterations). So far it seems to be completing successfully which indicates
>> that it was a problem with too many calls to the ESMF regrid with weights
>> function? But I don't understand why that would be so (if it were the
>> issue).
>> Regards
>> Archana
>>
>> On Tue, Jan 12, 2016 at 3:04 PM, Archana Dayalu <adayalu at seas.harvard.edu
>> > wrote:
>>
>>> Hi Mary,
>>> I just uploaded my code + data to the ftp site. The files I have
>>> uploaded are:
>>> 1) 31 files of type "aux2vprm_d01_2006-01-dd_hrlyavg.nc" (where dd=two
>>> digit day in january), 2.5MB each
>>> 2) 1 weights file "T2SWDOWN_to_MODIS_bilinear.faster.nc" (this is kind
>>> of big ... ~6GB)
>>> 3) 1 file with the script
>>> "regrid_using_weights_T2SWDOWN_MOD_noplots.ncl" (edit the header of file to
>>> specify directories; all else should not require editing).
>>> Other requested information:
>>> ncl -V = 6.2.1
>>> uname -a = Linux rclogin11.rc.fas.harvard.edu
>>> 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64
>>> x86_64 GNU/Linux
>>> Thanks very much! Please let me know if you have any trouble with the
>>> code/files.
>>> Regards
>>> Archana
>>>
>>> On Tue, Jan 12, 2016 at 11:29 AM, Mary Haley <haley at ucar.edu> wrote:
>>>
>>>> Are you by any chance using setfileoption to change the write type to
>>>> "NetCDF4"?
>>>>
>>>> I'm wondering if there's an issue with going into NetCDF4 mode, that
>>>> may cause a problem if you stay in that mode when reading your weights file.
>>>>
>>>> Can you provide us with the full code and all the files needed to run
>>>> it? You can use our ftp if the files are not too large:
>>>>
>>>> http://www.ncl.ucar.edu/report_bug.shtml#HowToFTP
>>>>
>>>>
>>>> --Mary
>>>>
>>>>
>>>> On Mon, Jan 11, 2016 at 2:09 PM, Archana Dayalu <
>>>> adayalu at seas.harvard.edu> wrote:
>>>>
>>>>> Hi there,
>>>>> I am using the ESMF tools to regrid a bunch of wrf-grid files to
>>>>> rectangular grid. I am running each month independently (running the script
>>>>> in 12 independent batches, where each batch corresponds to a month) but the
>>>>> days in a given month are done serially in that loop. My steps, as a quick
>>>>> summary, are:
>>>>> (1) generate the weights once, valid for all variables for all files
>>>>> for the year. Then, for each of 365 daily temperature and radiation files:
>>>>> (2) extract the daily T2 (Kelvin), SWDOWN variables which I previously
>>>>> defined as a 3D array (XDim,YDim,23)
>>>>> (3) apply the weights to hourly T2, SWDOWN output for a year. This
>>>>> step involves a loop ... and the relevant part includes:
>>>>> do hh=0,23
>>>>>
>>>>> T2_regrid       = ESMF_regrid_with_weights(T2,wgt_file_name,Opt)
>>>>>
>>>>> SWDOWN_regrid = ESMF_regrid_with_weights(SWDOWN,wgt_file_name,Opt)
>>>>> T2_array(:,:,hh) = T2_regrid
>>>>>
>>>>> SWDOWN_array(:,:,hh) = SWDOWN_regrid
>>>>>
>>>>> ; I also have it print out min max for each variable for each hour
>>>>> regridded
>>>>>
>>>>> end do
>>>>>
>>>>> (4) Write out the two regridded arrays to a daily netcdf file using
>>>>> method 1, Netcdf4, with file compression level set to 4
>>>>>
>>>>> The problem: The script fails toward the end (e.g, around the 27th
>>>>> file of a month) with the following error message and begins to input
>>>>> nonsense Temperature and radiation values into the array and states the
>>>>> weights file cannot be opened (it does exist). When I re-run the failed
>>>>> time-period as a one-time regridding, it regrids just fine. Is ncl unhappy
>>>>> because of the high I/O? It doesn't fail at the netcdf generation step ...
>>>>> it seems to fail at the regridding step itself. Thanks for any help! The
>>>>> error output follows ...
>>>>>
>>>>> Variable: hh
>>>>> Type: integer
>>>>> Total Size: 4 bytes
>>>>>             1 values
>>>>> Number of Dimensions: 1
>>>>> Dimensions and sizes:   [1]
>>>>> Coordinates:
>>>>> (0)     15
>>>>> fatal:["NclNewHDF5.c":3787]:NclNewHDF5.c: Cannot open file:
>>>>> </n/regal/wofsy_lab/adayalu/T2SWDOWN_to_MODIS_bilinear.faster.nc>
>>>>> fatal:["NclAdvancedFile.c":2667]:_NclFileFillHLFS: Could not open
>>>>> (/n/regal/wofsy_lab/adayalu/T2SWDOWN_to_MODIS_bilinear.faster.nc)
>>>>> (0)   ESMF_regrid_with_weights: cannot open weight file
>>>>> '/n/regal/wofsy_lab/adayalu/T2SWDOWN_to_MODIS_bilinear.faster.nc'.
>>>>> (0)
>>>>> (0)     TEMP at 2 M: min=239.284   max=299.333
>>>>> (0)
>>>>> (0)     min=9.96921e+36   max=9.96921e+36
>>>>> warning:VarVarWrite: rhs has no dimension name or coordinate variable,
>>>>> deleting name of lhs dimension number (0) and destroying coordinate var,
>>>>>  use "(/../)" if this is not desired outcome
>>>>> warning:VarVarWrite: rhs has no dimension name or coordinate variable,
>>>>> deleting name of lhs dimension number (1) and destroying coordinate var,
>>>>>  use "(/../)" if this is not desired outcome
>>>>> warning:["Execute.c":8578]:Execute: Error occurred at or near line 64
>>>>> in file regrid_using_weights_T2SWDOWN_MOD_noplots.ncl
>>>>>
>>>>> HDF5-DIAG: Error detected in HDF5 (1.8.13) thread 0:
>>>>>   #000: H5D.c line 461 in H5Dget_space(): not a dataset
>>>>>     major: Invalid arguments to routine
>>>>>     minor: Inappropriate type
>>>>> (...then a series of similar error messages are output .... and then
>>>>> it continues .... and nonsense is output):
>>>>>
>>>>> Variable: hh
>>>>> Type: integer
>>>>> Total Size: 4 bytes
>>>>>             1 values
>>>>> Number of Dimensions: 1
>>>>> Dimensions and sizes:   [1]
>>>>> Coordinates:
>>>>> (0)     17
>>>>> (0)
>>>>> (0)     TEMP at 2 M: min=0.0114805   max=299.333
>>>>> (0)
>>>>> (0)     DOWNWARD SHORT WAVE FLUX AT GROUND SURFACE: min=0
>>>>> max=2.90338e-31
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ncl-talk mailing list
>>>>> ncl-talk at ucar.edu
>>>>> List instructions, subscriber options, unsubscribe:
>>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> ____________________________________________________
>>> *Archana Dayalu*
>>> Graduate Student
>>> Dept. of Earth and Planetary Sciences
>>> Harvard University
>>> 24 Oxford Street #402
>>> Cambridge, MA 02138
>>> (617) 384-8206
>>>
>>
>>
>>
>> --
>> ____________________________________________________
>> *Archana Dayalu*
>> Graduate Student
>> Dept. of Earth and Planetary Sciences
>> Harvard University
>> 24 Oxford Street #402
>> Cambridge, MA 02138
>> (617) 384-8206
>>
>
>


-- 
____________________________________________________
*Archana Dayalu*
Graduate Student
Dept. of Earth and Planetary Sciences
Harvard University
24 Oxford Street #402
Cambridge, MA 02138
(617) 384-8206
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20160119/8292c69e/attachment.html 


More information about the ncl-talk mailing list