# [ncl-talk] correlation and the significance

Dennis Shea shea at ucar.edu
Wed Oct 8 17:22:36 MDT 2014

```[1] Minor comment: you do not have to set

ts1 at _FillValue = 1e20
ts2 at _FillValue = 1e20

ts1 and ts2 can have different _FillValue. NCL does not care.

[2] Did you look at the range for tmp2? Something is strange.
tmp2: min=0 max=304.18
tmp1: min=231.963 max=304.787

Likely, you should set:  tmp2 = where (tmp2.eq.0, tmp2 at _FillValue,tmp2)

[3] You did not include the value for Nx. This is one key to you problem.

Nx    = num(.not.ismissing(ts1))  ; all gridpoints at all
times  (ntim*nlat*mlon)

This  is a very large number. If there were no _FillValue:  Nx=
360*90*180=5832000.
It will be less than that if there are _FillValue present.  Still, it
will be a large number.

'escorc' is calculating the linear cross correlation coefficient. As
noted in the documentation,
===
"The linear correlation coefficient (*r*) for *n* pairs of independent
observations can be tested
against the null hypothesis (ie.: no correlation) using the
statistic

tval = *r**sqrt[ (*n*-2)/(1-*r*^2) ]

This statistic has a Student *t* distribution with *n*-2 degrees of
freedom.
===
Here 'n' is the number of ******independent****** values.

Look at  the Student-t distribution. See table at:
http://en.wikipedia.org/wiki/Student%27s_t-distribution
Look at the degrees of freedom...  Your dof would be 5832000-1 (infinity).
Then look at the t-statistic Even with infinite dof  the largest values are
~3.3
Punch line ... any value will be significant if you use dof=5832000-1

**********

[a] You should use the appropriate number of **time values**.
***If***every monthly time series value
were independent you would have ntim-2 degrees of freedom (360-2)  NOT
5832000-2
degrees of freedom which is what you used.  Assuming independence in
time values,
the way to  determine the Nx at each grid point would be.

Nx = dim_num_n(ts1,2)  ; look at documentation. know what it does.

This is not appropriate for your application .... Why? see [b]

[b] More problematical ... If you have not removed the annual cycle, there
is a *HIGH*
degree of auto-correlation within  each series. Autocorrelation
reduces the number
of independent values.

I would suggest using sst anomalies. This will reduce but not
eliminate autocorrelation
One way to estimate dof in the presence of autocorrelation is NCL's

https://www.ncl.ucar.edu/Document/Functions/Built-in/equiv_sample_size.shtml

You will get two dof estimates. One for each data set. Pick the
lesser and use it in rtest.

==========
To avoid constant email exchanges, I am including what I think will work.

PLEASE .. look and understand what is being done. The printVarSummary(...)
and printMinMax(...) are for *you* to look at ...  not ncl-talk.

ncl-talk is not a statistical advising forum. Please talk with a colleague
these statistical issues.

a personal salutation (Hi Dennis). Likely, there are many people who know
more
statistics than me .

++++++++++++++++++
The following code could be simplified a bit if you had NCL v6.2.1 by using
escorc_n
https://www.ncl.ucar.edu/Document/Functions/Built-in/escorc_n.shtml
====

; Compute climatologies and anomalies. This removes the annual cycle.

tmp1Clm = clmMonTLL(tmp1)
tmp1Anom= calcMonAnomTLL(tmp1, tmp1Clm)
printVarSummary(tmp1Anom)

tmp2Clm = clmMonTLL(tmp2)
tmp2Anom= calcMonAnomTLL(tmp2, tmp2Clm)
printVarSummary(tmp2Anom)

; For convenience, reorder and store the data

TMP1 = tmp1Anom(lat|:,lon|:,time|:)
TMP2 = tmp2Anom(lat|:,lon|:,time|:)

delete( [/tmp1Anom, tmp2Anom /] )   ; no longer needed

; Compute cross correlation coefficients at 0-lag

r   = escorc( TMP1, TMP2 )   ; (nlat,mlon)
copy_VarCoords(TMP1(:,:,0),r)
r at long_name = "linear cross correlation"
printVarSummary(r)

; Calculate degrees of freedom for each time series pair
; This estimates the dof in autocorrelated series.

siglvl = 0.05
dof1   = equiv_sample_size (TMP1, siglvl,0)
dof2   = equiv_sample_size (TMP2, siglvl,0)
printMinMax(dof1, 0)
printMinMax(dof2, 0)

; For each grid point time series pair, choose the lesser abs value

dof  = where(abs(dof1).le.abs(dof2), dof1, dof2)
copy_VarCoords(r,dof)
dof at long_name = "degrees of freedom"
printVarSummary(dof)
printMinMax(dof,0)

rsig = rtest( r, round(dof,3), 0)
rsig at long_name = "significance of r"
copy_VarCoords(r,rsig)
printMinMax(rsig, 0)

On Mon, Oct 6, 2014 at 2:07 PM, Vanúcia Schumacher <
vanucia-schumacher at hotmail.com> wrote:

> Hi Dennis,
>
> The information follows:
>
> Variable: tmp1
> Type: double
> Total Size: 46656000 bytes
>             5832000 values
> Number of Dimensions: 3
> Dimensions and sizes: [TIME | 360] x [lat | 90] x [lon | 180]
> Coordinates:
>             TIME: [527040..16260480]
>             lat: [-89.69999694824219..89.69999694824219]
>             lon: [   0..359.7000122070312]
> Number Of Attributes: 5
>   remap : remapped via ESMF_regrid_with_weights: Bilinear remapping
>   missing_value : -9.999999999999999e+33
>   _FillValue : -9.999999999999999e+33
>   long_name : SST[GX=X2DEG,GY=Y2DEG]
>   history : From sst
>
> Variable: tmp2
> Type: float
> Total Size: 23328000 bytes
>             5832000 values
> Number of Dimensions: 3
> Dimensions and sizes: [time | 360] x [lat | 90] x [lon | 180]
> Coordinates:
>             time: [15.5..10941.5]
>             lat: [-89.69999694824219..89.69999694824219]
>             lon: [   0..359.7000122070312]
> Number Of Attributes: 12
>   remap : remapped via ESMF_regrid_with_weights: Bilinear remapping
>   associated_files : baseURL:
> http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile:
>   _FillValue : 1e+20
>   missing_value : 1e+20
>   history : 2011-08-15T19:52:20Z altered by CMOR: replaced missing value
> flag (-9.99e+33) with standard missing value (1e+20).
>   cell_measures : area: areacello
>   cell_methods : time: mean (interval: 20 minutes)
>   original_name : tos
>   units : K
>   comment : "this may differ from ""surface temperature"" in regions of
> sea ice."
>   long_name : Sea Surface Temperature
>   standard_name : sea_surface_temperature
> (0) tmp2: min=0 max=304.18
> (0) tmp1: min=231.963 max=304.787
>
> Variable: ts1
> Type: double
> Total Size: 46656000 bytes
>             5832000 values
> Number of Dimensions: 3
> Dimensions and sizes: [lat | 90] x [lon | 180] x [TIME | 360]
> Coordinates:
>             lat: [-89.69999694824219..89.69999694824219]
>             lon: [   0..359.7000122070312]
>             TIME: [527040..16260480]
> Number Of Attributes: 5
>   history : From sst
>   long_name : SST[GX=X2DEG,GY=Y2DEG]
>   _FillValue : -9.999999999999999e+33
>   missing_value : -9.999999999999999e+33
>   remap : remapped via ESMF_regrid_with_weights: Bilinear remapping
>
> Variable: ts2
> Type: float
> Total Size: 23328000 bytes
>             5832000 values
> Number of Dimensions: 3
> Dimensions and sizes: [lat | 90] x [lon | 180] x [time | 360]
> Coordinates:
>             lat: [-89.69999694824219..89.69999694824219]
>             lon: [   0..359.7000122070312]
>             time: [15.5..10941.5]
> Number Of Attributes: 12
>   standard_name : sea_surface_temperature
>   long_name : Sea Surface Temperature
>   comment : "this may differ from ""surface temperature"" in regions of
> sea ice."
>   units : K
>   original_name : tos
>   cell_methods : time: mean (interval: 20 minutes)
>   cell_measures : area: areacello
>   history : 2011-08-15T19:52:20Z altered by CMOR: replaced missing value
> flag (-9.99e+33) with standard missing value (1e+20).
>   missing_value : 1e+20
>   _FillValue : 1e+20
>   associated_files : baseURL:
> http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile:
>   remap : remapped via ESMF_regrid_with_weights: Bilinear remapping
> warning:escorc: Non-fatal conditions encountered in series or xstd equals
> zero.
> Possibly, all values of a series are constant.
> warning:escorc: Most likely, one or more series consisted of all constant
> values
>
> Variable: ccr
> Type: double
> Total Size: 129600 bytes
>             16200 values
> Number of Dimensions: 2
> Dimensions and sizes: [lat | 90] x [lon | 180]
> Coordinates:
>             lat: [-89.69999694824219..89.69999694824219]
>             lon: [   0..359.7000122070312]
> Number Of Attributes: 1
>   _FillValue : 1.000000020040877e+20
> (0) ccr: min=-0.386039 max=0.986473
>
> Variable: Nx
> Type: integer
> Total Size: 4 bytes
>             1 values
> Number of Dimensions: 1
> Dimensions and sizes: [1]
> Coordinates:
>
>
> Variable: prob
> Type: double
> Total Size: 129600 bytes
>             16200 values
> Number of Dimensions: 2
> Dimensions and sizes: [lat | 90] x [lon | 180]
> Coordinates:
>             lat: [-89.69999694824219..89.69999694824219]
>             lon: [   0..359.7000122070312]
>
> (0) min=-0.386039   max=0.986473
>
> Script:
>
> ...
>
> begin
>
> ;************************************************
> ; open file and read in variable
> ;***********************************************
>
>
>   tmp1 = in1->tos
>   tmp2 = in2->tos
>
> printVarSummary(tmp1)
> printVarSummary(tmp2)
> print("tmp2: min="+min(tmp2)+" max="+max(tmp2))
> print("tmp1: min="+min(tmp1)+" max="+max(tmp1))
>
> ;************************************************
> ; reorder to get time as right most dimension
> ;***********************************************
>
>   ts1 = tmp1(lat|:,lon|:,TIME|:)
>   ts2 = tmp2(lat|:,lon|:,time|:)
>
> printVarSummary(ts1)
> printVarSummary(ts2)
>
> ;************************************************
> ; calculate cross correlations
> ;************************************************
>
> ts1 at _FillValue = 1e20
> ts2 at _FillValue = 1e20
>
>
>   ccr    =   escorc(ts2,ts1)
>
>   copy_VarCoords(ts1,ccr)
>
> ;*************************************************
> ;  statistical significance of ccr
> ;*************************************************
>
>   siglvl=0.05
>
>   Nx    = num(.not.ismissing(ts1))
>
>   prob  = rtest(ccr,Nx, 0)
>
>   copy_VarCoords(ccr,prob)
>
> ; ========================= PLOT 1 ==============================
>   wks   = gsn_open_wks ("png", "conOncon.mri" )   ; open workstation
>
>   gsn_define_colormap(wks,"cmp_b2r")
>
>   res                      = True                ; make plot mods
>
>   res at cnFillOn             = True                ; turn on color
>
>
>   res at gsnSpreadColors      = True                ; use full colormap
>
>
>   res at lbLabelAutoStride    = True                ; automatic lb label
> stride
>
>   res at cnLinesOn            = False               ; turn off contour lines
>
>   res at cnLevelSelectionMode = "ManualLevels"  ; set manual contour levels
>   res at cnMinLevelValF       = -1.             ; set min contour level
>   res at cnMaxLevelValF       =  1.             ; set max contour level
>   res at cnLevelSpacingF      =  0.1           ; set contour spacing
>
>   res at gsnDraw              = False           ; Do not draw plot
>   res at gsnFrame             = False           ; Do not advance frome
>
>
>   plot = gsn_csm_contour_map_ce(wks,ccr(:,:), res)
>
>   plot = ZeroNegDashLineContour (plot)
>
> ; ========================= PLOT 2 ==============================
>   res2 = True                            ; res2 probability plots
>
>   res2 at gsnDraw             = False       ; Do not draw plot
>   res2 at gsnFrame            = False       ; Do not advance frome
>
>   res2 at cnLevelSelectionMode = "ManualLevels" ; set manual contour levels
>  res2 at cnMinLevelValF      = 0.0      ; set min contour level
>  res2 at cnMaxLevelValF      = 1.05       ; set max contour level
>   res2 at cnLevelSpacingF     = 0.05        ; set contour spacing
>
>   res2 at cnInfoLabelOn       = False       ; turn off info label
>
>   res2 at cnLinesOn           = False       ; do not draw contour lines
>   res2 at cnLineLabelsOn      = False       ; do not draw contour labels
>
>   plot2   = ShadeLtContour(plot2, 0.06, 17)  ; shade all areas less than
> the
>                                              ; 0.05 contour level
>   overlay (plot, plot2)
>
>   draw (plot)
>   frame(wks)
>
> end
>
> ---
> Vanúcia Schumacher
> Mestranda em Meteorologia - UFV
> Meteorologista -UFPel
> Departamento de Meteorologia Agrícola - DEA
> Cel: (31) 9978 2522
> DEA: (31) 3899 1890
>
>
> ------------------------------
> Date: Mon, 6 Oct 2014 13:55:27 -0600
> Subject: Re: [ncl-talk] correlation and the significance
> From: shea at ucar.edu
> To: vanucia-schumacher at hotmail.com
> CC: ncl-talk at ucar.edu
>
>
>
> The question is difficult to answer.
> You have not included information which ncl-talk constantly request of
> users.
>
> [0] Always include the version of NCL you are using.
>
> Always look at your data and the results of assorted calculations via
> printVarSummary and print
>
> [1] printVarSummary(ts1)                   ; ?? (time) => [ntim] ??  or
> (time,lat,lon)  ==>   [ntim,nlat,mlon]  ??
>      printVarSummary(ts2)
>
> [2] ccr    =   escorc(ts2,ts1)
>      printVarSummary(ccr)
>
> [3] Nx    = num(.not.ismissing(ts1))
>      print(Nx)                                          ; if ts1 is
> one-dimensional
>      printVarSummary(Nx)                     ; if ts1 & ts2 are
> three-dimensional
>
>     This give you the **total** number of non-missing points.
>      If the ts1 and ts2 arrays are three-dimensional, you would likely see
> a VERY LARGE number
>      Very large numbers result in everything being significant.
>
> [4] If ts1 and ts2 are (time,lat,lon)  and you have NCL versions 6.1.1 or
>      earlier, then you should reorder the arrays so time is the
>      rightmost dimensions . See escorc Example 5.
>
>     If you have NCL v6.2.1 and 3D arrays, you can use  ccr    =
> escorc_n(ts2,ts1,0)
>
> ====
> Respond *only* to ncl-talk. Please, no personal salutation.
>
>
> On Mon, Oct 6, 2014 at 11:09 AM, Vanúcia Schumacher <
> vanucia-schumacher at hotmail.com> wrote:
>
> I need to plot correlation and the significance level of 95%, which, I did
> r test and applied the example  conOncon_4.ncl‏ of NCL, but I think
> there's something wrong with my script, because for all the data I use,
> plot area 100% the graph as significant, and I think there's something
> wrong, someone could help me on this?
> ....
>
> ; calculate cross correlations
> ;************************************************
>   ccr    =   escorc(ts2,ts1)
>
>   copy_VarCoords(ts1,ccr)
>
> ;*************************************************
> ;  statistical significance of ccr
> ;*************************************************
>
>   siglvl=0.05
>
>   Nx    = num(.not.ismissing(ts1))
>
>   prob  = rtest(ccr,Nx, 0)
>
>   copy_VarCoords(ccr,prob)
>
> ....
>
>  res2 at cnLevelSelectionMode = "ManualLevels"
>  res2 at cnMinLevelValF      = 0.00        ; set min contour level
>  res2 at cnMaxLevelValF      = 1.05        ; set max contour level
>   res2 at cnLevelSpacingF     = 0.05        ; set contour spacing
>
>
>   plot2   = ShadeLtContour(plot2, 0.06, 17)  ; shade all areas less than
> the  0.05 contour level
>   overlay (plot, plot2)
>
> ---
> Vanúcia Schumacher
> Mestranda em Meteorologia - UFV
> Meteorologista -UFPel
> Departamento de Meteorologia Agrícola - DEA
> Cel: (31) 9978 2522
> DEA: (31) 3899 1890
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20141008/a6595bff/attachment.html
```