[ncl-talk] Inconsistency in dimension averaging using dim_avg_n

Tabish Ansari tabishumaransari at gmail.com
Wed Aug 21 09:58:48 MDT 2024


Hi Dave,

Thanks a lot for the insightful explanation. Yes, there are many missing
values within the dataset.

As a test, I performed manual averaging by using three nested loops (over
each dimension) and summed the non-missing values while also counting the
non-missing instances. The final result matched with the single step
dim_avg_n result (I included coordinate arrays this time). This also
confirmed that there is no built-in area-weighting in the dim_avg_n
function.

So, I then multiplied the data with a corresponding 2D array of weighting
fractions that I had generated separately (these were just normalized areas
per gridcell) before performing the single step dim_avg_n operation.

Thanks again, and hope that this discussion is useful for other users in
the future.

best regards,

Tabish
--------------------------------------------------------------------------------------
Dr Tabish Ansari
Research Associate
Air Quality Modelling Group
Research Institute for Sustainability (RIFS) - Helmholtz Centre Potsdam
Potsdam, Germany


On Wed, 21 Aug 2024 at 17:13, Dave Allured - NOAA Affiliate <
dave.allured at noaa.gov> wrote:

> Tabish, the differing results may be caused by missing values in your
> array, resulting in unequal weighting of the remaining values in 2-step
> averaging.  This is basic math, not a particular NCL behavior.  Offhand I
> would say that the single step method is the only correct method here.
>
> No, dim_avg_n never performs a weighted average, whether or not coordinate
> arrays are included.  As the documentation says, missing values are
> properly excluded from each individual averaging calculation.
>
> Perhaps someone else can comment on the best way to perform a spatial
> weighted average.
>
>
> On Wed, Aug 21, 2024 at 7:31 AM Tabish Ansari via ncl-talk <
> ncl-talk at mailman.ucar.edu> wrote:
>
>> Hello,
>>
>> I've got a 3D variable called "monthlypm25NCP". It has 12 timesteps and
>> 41 lat x 41 lon values.
>>
>> Here's the variable summary:
>> Variable: monthlypm25NCP
>> Type: float
>> Total Size: 80688 bytes
>>             20172 values
>> Number of Dimensions: 3
>> Dimensions and sizes:   [12] x [41] x [41]
>> Coordinates:
>> Number Of Attributes: 1
>>   _FillValue :  9.96921e+36
>>
>> Since this is a derived variable from another 3-hourly variable, the
>> coordinate arrays are not retained. (I could have copied them over but I
>> didn't in this instance.)
>>
>> Now, I want to average this 3D variable over the lat-lon grid to reduce
>> it to a 1D variable containing only 12 values (one for each month).
>>
>> I tried using the dim_avg_n in two different ways to achieve this:
>>
>> *1. In 2-steps: *
>> monthlypm25NCPsum1= dim_avg_n(monthlypm25NCP, 2)
>> monthlypm25NCPsum = dim_avg_n(monthlypm25NCPsum1, 1)
>> print(monthlypm25NCPsum+"")
>>
>> Result:
>> (0)     167.83
>> (1)     150.403
>> (2)     124.87
>> (3)     102.86
>> (4)     90.6969
>> (5)     80.4786
>> (6)     75.9811
>> (7)     71.1969
>> (8)     93.9213
>> (9)     117.72
>> (10)    136.6
>> (11)    139.528
>>
>> *2. In a single step:*
>> monthlypm25NCPsum = dim_avg_n(monthlypm25NCP, (/1,2/))
>> print(monthlypm25NCPsum+"")
>>
>> Result:
>> (0)     155.3
>> (1)     140.645
>> (2)     116.423
>> (3)     96.4202
>> (4)     84.4638
>> (5)     76.3392
>> (6)     72.5972
>> (7)     67.5716
>> (8)     88.2773
>> (9)     110.789
>> (10)    129.426
>> (11)    131.247
>>
>> I was expecting the results to be identical but strangely they're not, as
>> you can see above.
>>
>> Can you please explain what's causing the difference here?
>>
>> Is it possible that in the second case, the dim_avg_n function is
>> recognizing the lat-lon grid and using a weighted averaging based on actual
>> grid area? But how can it recognize that when I have not included the
>> coordinate arrays?
>>
>> Ultimately, I do want to perform a weighted averaging over the lat-lon
>> grid and have obtained a separate matrix that contains gridcell area (I
>> used the cdo tool to obtain it). Should I do a sparse matrix multiplication
>> with the gridcell area before performing the grid averaging in NCL or does
>> the dim_avg_n function take care of the grid area itself?
>>
>> Thanks
>> Tabish
>>
>> --------------------------------------------------------------------------------------
>> Dr Tabish Ansari
>> Research Associate
>> Air Quality Modelling Group
>> Research Institute for Sustainability (RIFS) - Helmholtz Centre Potsdam
>> Potsdam, Germany
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ucar.edu/pipermail/ncl-talk/attachments/20240821/4091ea93/attachment.htm>


More information about the ncl-talk mailing list