[ncl-talk] correlation and the significance
Dennis Shea
shea at ucar.edu
Wed Oct 8 17:22:36 MDT 2014
[1] Minor comment: you do not have to set
ts1 at _FillValue = 1e20
ts2 at _FillValue = 1e20
ts1 and ts2 can have different _FillValue. NCL does not care.
[2] Did you look at the range for tmp2? Something is strange.
tmp2: min=0 max=304.18
tmp1: min=231.963 max=304.787
Likely, you should set: tmp2 = where (tmp2.eq.0, tmp2 at _FillValue,tmp2)
[3] You did not include the value for Nx. This is one key to you problem.
Nx = num(.not.ismissing(ts1)) ; all gridpoints at all
times (ntim*nlat*mlon)
This is a very large number. If there were no _FillValue: Nx=
360*90*180=5832000.
It will be less than that if there are _FillValue present. Still, it
will be a large number.
'escorc' is calculating the linear cross correlation coefficient. As
noted in the documentation,
===
"The linear correlation coefficient (*r*) for *n* pairs of independent
observations can be tested
against the null hypothesis (ie.: no correlation) using the
statistic
tval = *r**sqrt[ (*n*-2)/(1-*r*^2) ]
This statistic has a Student *t* distribution with *n*-2 degrees of
freedom.
===
Here 'n' is the number of ******independent****** values.
Look at the Student-t distribution. See table at:
http://en.wikipedia.org/wiki/Student%27s_t-distribution
Look at the degrees of freedom... Your dof would be 5832000-1 (infinity).
Then look at the t-statistic Even with infinite dof the largest values are
~3.3
Punch line ... any value will be significant if you use dof=5832000-1
**********
[a] You should use the appropriate number of **time values**.
***If***every monthly time series value
were independent you would have ntim-2 degrees of freedom (360-2) NOT
5832000-2
degrees of freedom which is what you used. Assuming independence in
time values,
the way to determine the Nx at each grid point would be.
Nx = dim_num_n(ts1,2) ; look at documentation. know what it does.
This is not appropriate for your application .... Why? see [b]
[b] More problematical ... If you have not removed the annual cycle, there
is a *HIGH*
degree of auto-correlation within each series. Autocorrelation
reduces the number
of independent values.
I would suggest using sst anomalies. This will reduce but not
eliminate autocorrelation
One way to estimate dof in the presence of autocorrelation is NCL's
https://www.ncl.ucar.edu/Document/Functions/Built-in/equiv_sample_size.shtml
You will get two dof estimates. One for each data set. Pick the
lesser and use it in rtest.
==========
To avoid constant email exchanges, I am including what I think will work.
PLEASE .. look and understand what is being done. The printVarSummary(...)
and printMinMax(...) are for *you* to look at ... not ncl-talk.
ncl-talk is not a statistical advising forum. Please talk with a colleague
about
these statistical issues.
Finally, please send responses to ncl-talk only. PLEASE do not include
a personal salutation (Hi Dennis). Likely, there are many people who know
more
statistics than me .
++++++++++++++++++
The following code could be simplified a bit if you had NCL v6.2.1 by using
escorc_n
https://www.ncl.ucar.edu/Document/Functions/Built-in/escorc_n.shtml
====
; Compute climatologies and anomalies. This removes the annual cycle.
tmp1Clm = clmMonTLL(tmp1)
tmp1Anom= calcMonAnomTLL(tmp1, tmp1Clm)
printVarSummary(tmp1Anom)
tmp2Clm = clmMonTLL(tmp2)
tmp2Anom= calcMonAnomTLL(tmp2, tmp2Clm)
printVarSummary(tmp2Anom)
; For convenience, reorder and store the data
TMP1 = tmp1Anom(lat|:,lon|:,time|:)
TMP2 = tmp2Anom(lat|:,lon|:,time|:)
delete( [/tmp1Anom, tmp2Anom /] ) ; no longer needed
; Compute cross correlation coefficients at 0-lag
r = escorc( TMP1, TMP2 ) ; (nlat,mlon)
copy_VarCoords(TMP1(:,:,0),r)
r at long_name = "linear cross correlation"
printVarSummary(r)
; Calculate degrees of freedom for each time series pair
; This estimates the dof in autocorrelated series.
siglvl = 0.05
dof1 = equiv_sample_size (TMP1, siglvl,0)
dof2 = equiv_sample_size (TMP2, siglvl,0)
printMinMax(dof1, 0)
printMinMax(dof2, 0)
; For each grid point time series pair, choose the lesser abs value
dof = where(abs(dof1).le.abs(dof2), dof1, dof2)
copy_VarCoords(r,dof)
dof at long_name = "degrees of freedom"
printVarSummary(dof)
printMinMax(dof,0)
rsig = rtest( r, round(dof,3), 0)
rsig at long_name = "significance of r"
copy_VarCoords(r,rsig)
printMinMax(rsig, 0)
On Mon, Oct 6, 2014 at 2:07 PM, Vanúcia Schumacher <
vanucia-schumacher at hotmail.com> wrote:
> Hi Dennis,
>
> The information follows:
>
> Variable: tmp1
> Type: double
> Total Size: 46656000 bytes
> 5832000 values
> Number of Dimensions: 3
> Dimensions and sizes: [TIME | 360] x [lat | 90] x [lon | 180]
> Coordinates:
> TIME: [527040..16260480]
> lat: [-89.69999694824219..89.69999694824219]
> lon: [ 0..359.7000122070312]
> Number Of Attributes: 5
> remap : remapped via ESMF_regrid_with_weights: Bilinear remapping
> missing_value : -9.999999999999999e+33
> _FillValue : -9.999999999999999e+33
> long_name : SST[GX=X2DEG,GY=Y2DEG]
> history : From sst
>
> Variable: tmp2
> Type: float
> Total Size: 23328000 bytes
> 5832000 values
> Number of Dimensions: 3
> Dimensions and sizes: [time | 360] x [lat | 90] x [lon | 180]
> Coordinates:
> time: [15.5..10941.5]
> lat: [-89.69999694824219..89.69999694824219]
> lon: [ 0..359.7000122070312]
> Number Of Attributes: 12
> remap : remapped via ESMF_regrid_with_weights: Bilinear remapping
> associated_files : baseURL:
> http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile:
> gridspec_ocean_fx_MRI-CGCM3_decadal1980_r0i0p0.nc areacello:
> areacello_fx_MRI-CGCM3_decadal1980_r0i0p0.nc
> _FillValue : 1e+20
> missing_value : 1e+20
> history : 2011-08-15T19:52:20Z altered by CMOR: replaced missing value
> flag (-9.99e+33) with standard missing value (1e+20).
> cell_measures : area: areacello
> cell_methods : time: mean (interval: 20 minutes)
> original_name : tos
> units : K
> comment : "this may differ from ""surface temperature"" in regions of
> sea ice."
> long_name : Sea Surface Temperature
> standard_name : sea_surface_temperature
> (0) tmp2: min=0 max=304.18
> (0) tmp1: min=231.963 max=304.787
>
> Variable: ts1
> Type: double
> Total Size: 46656000 bytes
> 5832000 values
> Number of Dimensions: 3
> Dimensions and sizes: [lat | 90] x [lon | 180] x [TIME | 360]
> Coordinates:
> lat: [-89.69999694824219..89.69999694824219]
> lon: [ 0..359.7000122070312]
> TIME: [527040..16260480]
> Number Of Attributes: 5
> history : From sst
> long_name : SST[GX=X2DEG,GY=Y2DEG]
> _FillValue : -9.999999999999999e+33
> missing_value : -9.999999999999999e+33
> remap : remapped via ESMF_regrid_with_weights: Bilinear remapping
>
> Variable: ts2
> Type: float
> Total Size: 23328000 bytes
> 5832000 values
> Number of Dimensions: 3
> Dimensions and sizes: [lat | 90] x [lon | 180] x [time | 360]
> Coordinates:
> lat: [-89.69999694824219..89.69999694824219]
> lon: [ 0..359.7000122070312]
> time: [15.5..10941.5]
> Number Of Attributes: 12
> standard_name : sea_surface_temperature
> long_name : Sea Surface Temperature
> comment : "this may differ from ""surface temperature"" in regions of
> sea ice."
> units : K
> original_name : tos
> cell_methods : time: mean (interval: 20 minutes)
> cell_measures : area: areacello
> history : 2011-08-15T19:52:20Z altered by CMOR: replaced missing value
> flag (-9.99e+33) with standard missing value (1e+20).
> missing_value : 1e+20
> _FillValue : 1e+20
> associated_files : baseURL:
> http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile:
> gridspec_ocean_fx_MRI-CGCM3_decadal1980_r0i0p0.nc areacello:
> areacello_fx_MRI-CGCM3_decadal1980_r0i0p0.nc
> remap : remapped via ESMF_regrid_with_weights: Bilinear remapping
> warning:escorc: Non-fatal conditions encountered in series or xstd equals
> zero.
> Possibly, all values of a series are constant.
> warning:escorc: Most likely, one or more series consisted of all constant
> values
>
> Variable: ccr
> Type: double
> Total Size: 129600 bytes
> 16200 values
> Number of Dimensions: 2
> Dimensions and sizes: [lat | 90] x [lon | 180]
> Coordinates:
> lat: [-89.69999694824219..89.69999694824219]
> lon: [ 0..359.7000122070312]
> Number Of Attributes: 1
> _FillValue : 1.000000020040877e+20
> (0) ccr: min=-0.386039 max=0.986473
>
> Variable: Nx
> Type: integer
> Total Size: 4 bytes
> 1 values
> Number of Dimensions: 1
> Dimensions and sizes: [1]
> Coordinates:
>
>
> Variable: prob
> Type: double
> Total Size: 129600 bytes
> 16200 values
> Number of Dimensions: 2
> Dimensions and sizes: [lat | 90] x [lon | 180]
> Coordinates:
> lat: [-89.69999694824219..89.69999694824219]
> lon: [ 0..359.7000122070312]
>
> (0) min=-0.386039 max=0.986473
>
> Script:
>
> ...
>
> begin
>
> ;************************************************
> ; open file and read in variable
> ;***********************************************
>
> in1 = addfile("tos_90x180_cfsr.nc","r")
> in2 = addfile("tos_90x180_MRI.nc","r")
>
> tmp1 = in1->tos
> tmp2 = in2->tos
>
> printVarSummary(tmp1)
> printVarSummary(tmp2)
> print("tmp2: min="+min(tmp2)+" max="+max(tmp2))
> print("tmp1: min="+min(tmp1)+" max="+max(tmp1))
>
> ;************************************************
> ; reorder to get time as right most dimension
> ;***********************************************
>
> ts1 = tmp1(lat|:,lon|:,TIME|:)
> ts2 = tmp2(lat|:,lon|:,time|:)
>
> printVarSummary(ts1)
> printVarSummary(ts2)
>
> ;************************************************
> ; calculate cross correlations
> ;************************************************
>
> ts1 at _FillValue = 1e20
> ts2 at _FillValue = 1e20
>
>
> ccr = escorc(ts2,ts1)
>
> copy_VarCoords(ts1,ccr)
>
> ;*************************************************
> ; statistical significance of ccr
> ;*************************************************
>
> siglvl=0.05
>
> Nx = num(.not.ismissing(ts1))
>
> prob = rtest(ccr,Nx, 0)
>
> copy_VarCoords(ccr,prob)
>
> ; ========================= PLOT 1 ==============================
> wks = gsn_open_wks ("png", "conOncon.mri" ) ; open workstation
>
> gsn_define_colormap(wks,"cmp_b2r")
>
> res = True ; make plot mods
>
> res at cnFillOn = True ; turn on color
>
>
> res at gsnSpreadColors = True ; use full colormap
>
>
> res at lbLabelAutoStride = True ; automatic lb label
> stride
>
> res at cnLinesOn = False ; turn off contour lines
>
> res at cnLevelSelectionMode = "ManualLevels" ; set manual contour levels
> res at cnMinLevelValF = -1. ; set min contour level
> res at cnMaxLevelValF = 1. ; set max contour level
> res at cnLevelSpacingF = 0.1 ; set contour spacing
>
> res at gsnDraw = False ; Do not draw plot
> res at gsnFrame = False ; Do not advance frome
>
>
> plot = gsn_csm_contour_map_ce(wks,ccr(:,:), res)
>
> plot = ZeroNegDashLineContour (plot)
>
> ; ========================= PLOT 2 ==============================
> res2 = True ; res2 probability plots
>
> res2 at gsnDraw = False ; Do not draw plot
> res2 at gsnFrame = False ; Do not advance frome
>
> res2 at cnLevelSelectionMode = "ManualLevels" ; set manual contour levels
> res2 at cnMinLevelValF = 0.0 ; set min contour level
> res2 at cnMaxLevelValF = 1.05 ; set max contour level
> res2 at cnLevelSpacingF = 0.05 ; set contour spacing
>
> res2 at cnInfoLabelOn = False ; turn off info label
>
> res2 at cnLinesOn = False ; do not draw contour lines
> res2 at cnLineLabelsOn = False ; do not draw contour labels
>
> plot2 = gsn_csm_contour(wks,gsn_add_cyclic_point(prob(:,:)), res2)
> plot2 = ShadeLtContour(plot2, 0.06, 17) ; shade all areas less than
> the
> ; 0.05 contour level
> overlay (plot, plot2)
>
> draw (plot)
> frame(wks)
>
> end
>
> ---
> Vanúcia Schumacher
> Mestranda em Meteorologia - UFV
> Meteorologista -UFPel
> Departamento de Meteorologia Agrícola - DEA
> Cel: (31) 9978 2522
> DEA: (31) 3899 1890
>
>
> ------------------------------
> Date: Mon, 6 Oct 2014 13:55:27 -0600
> Subject: Re: [ncl-talk] correlation and the significance
> From: shea at ucar.edu
> To: vanucia-schumacher at hotmail.com
> CC: ncl-talk at ucar.edu
>
>
>
> The question is difficult to answer.
> You have not included information which ncl-talk constantly request of
> users.
>
> [0] Always include the version of NCL you are using.
>
> Always look at your data and the results of assorted calculations via
> printVarSummary and print
>
> [1] printVarSummary(ts1) ; ?? (time) => [ntim] ?? or
> (time,lat,lon) ==> [ntim,nlat,mlon] ??
> printVarSummary(ts2)
>
> [2] ccr = escorc(ts2,ts1)
> printVarSummary(ccr)
>
> [3] Nx = num(.not.ismissing(ts1))
> print(Nx) ; if ts1 is
> one-dimensional
> printVarSummary(Nx) ; if ts1 & ts2 are
> three-dimensional
>
> This give you the **total** number of non-missing points.
> If the ts1 and ts2 arrays are three-dimensional, you would likely see
> a VERY LARGE number
> Very large numbers result in everything being significant.
>
> [4] If ts1 and ts2 are (time,lat,lon) and you have NCL versions 6.1.1 or
> earlier, then you should reorder the arrays so time is the
> rightmost dimensions . See escorc Example 5.
>
> If you have NCL v6.2.1 and 3D arrays, you can use ccr =
> escorc_n(ts2,ts1,0)
>
> ====
> Respond *only* to ncl-talk. Please, no personal salutation.
>
>
> On Mon, Oct 6, 2014 at 11:09 AM, Vanúcia Schumacher <
> vanucia-schumacher at hotmail.com> wrote:
>
> I need to plot correlation and the significance level of 95%, which, I did
> r test and applied the example conOncon_4.ncl of NCL, but I think
> there's something wrong with my script, because for all the data I use,
> plot area 100% the graph as significant, and I think there's something
> wrong, someone could help me on this?
> ....
>
> ; calculate cross correlations
> ;************************************************
> ccr = escorc(ts2,ts1)
>
> copy_VarCoords(ts1,ccr)
>
> ;*************************************************
> ; statistical significance of ccr
> ;*************************************************
>
> siglvl=0.05
>
> Nx = num(.not.ismissing(ts1))
>
> prob = rtest(ccr,Nx, 0)
>
> copy_VarCoords(ccr,prob)
>
> ....
>
> res2 at cnLevelSelectionMode = "ManualLevels"
> res2 at cnMinLevelValF = 0.00 ; set min contour level
> res2 at cnMaxLevelValF = 1.05 ; set max contour level
> res2 at cnLevelSpacingF = 0.05 ; set contour spacing
>
> plot2 = gsn_csm_contour(wks,gsn_add_cyclic_point(prob(:,:)), res2)
>
> plot2 = ShadeLtContour(plot2, 0.06, 17) ; shade all areas less than
> the 0.05 contour level
> overlay (plot, plot2)
>
> ---
> Vanúcia Schumacher
> Mestranda em Meteorologia - UFV
> Meteorologista -UFPel
> Departamento de Meteorologia Agrícola - DEA
> Cel: (31) 9978 2522
> DEA: (31) 3899 1890
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20141008/a6595bff/attachment.html
More information about the ncl-talk
mailing list