[ncl-talk] trend_manken data size limit?

Dennis Shea shea at ucar.edu
Mon Apr 6 16:42:11 MDT 2015


You sent your NCL script and test data offline.

The time series that it fails on has 29737 values... ~84% (24878) are 0.0
The is a really unusual distribution.

NCL's  Mann-Kendall code is a fortran subroutine which does two independent
tasks:
[1] Compute the Mann-Kendall (MK) trend statistic; [2] Compute the
Theil-Sen (TS) robust estimate of  linear trend.

[1] I am sure the MK statistic and probability estimate were computed.
[2] The TS estimate requires that the 'slopes' calulated within the do
loop  be sorted into ascending order. The code fails in the fcode used for
sorting. Specifically, at the following location of DPSORTU

[SNIP]

             65 I = I + 1
                IF (I.EQ.J) GO TO 55
 1341 ----> T = A(I+1)
                IF (A(I).LE.T) GO TO 65
                K = I
[SNIP]

The subroutine used is a 'famous' one translated from ALGOL (original
author: ?Wilkinson?) to fortran (Singleton). It is part of the ACM codes
and is available via netlib.

C Sorts A into ascending order, from A(1) to A(N).
C Reference: Richard C. Singleton, Algorithm 347, SORT. <=======
C Comm. ACM 3, 321 (March 1969).
C Algorithm is Copyright 1969 Association of Computing Machinery,


I speculate that the preponderance of 0.0 values has 'somehow' causes this
code to fail. Why it required > 25,000 values to fail .. I do not know.
Again, I speculate, if there had been more 0.0 in the 1st 25,000 it may
have failed with fewer points.

There is no chance that I am going to change this sorting code. It requires
more knowledge about sorting than I have. NOTE: the code used by NCL's
'dim_pqsort' is based on this code so I doubt this would be an acceptable
alternative. That code is also downloadable from netlib.

----
OK .... MK is 'distribution free' BUT I think some knowledge of how MK
works should be used. Basically, it checks to see if successive values go
up (+1), down (-1) or are tied (0). The *amount *of the change is not used
for MK. With series like 0,0,0,#,0,0,0,0,0,#,#,0,0,0,0,0,0,#,0,0,#,0,0,...
etc. It will often go from 0.0 to some positive number (+1) and from some
positive number to 0.0 (-1).  I would speculate that the +1 and -1 counts
almost nullify each other. All cases where the series has 0,0 would be a
tie. This will be *by far* the largest number. The algorithm uses a
correction for ties but I am not sure if was developed for cases where the
number of ties is >> greater that the sum of the +1 and -1.

Also, some knowledge of hoe Theil-Sen estimates are made. In the process of
calculating  the sums and ties  'slopea' are calculated. The median of the
slopes is the TS estimate. Since the majority of slopes are 0, the TS
median value will be 0.0.

Just using regline ...

     rc    = regline(year_fraction,precip_ave)          ; slope
     print(rc)


Variable: rc     <==== ALL
Type: float
Total Size: 4 bytes
            1 values
Number of Dimensions: 1
Dimensions and sizes:    [1]
Coordinates:
Number Of Attributes: 8
  units :    mmhr^-1/3hrs
  _FillValue :    -999.99
  yintercept :    2.898926
  yave :    0.112996
  xave :    2006.162
  nptxy :    29737
  rstd :    0.000660348
  tval :    -2.102962
(0)    -0.001388687



  ii = ind(precip_ave.gt.0.0)  ; observations of when it rains
  RC = regline(year_fraction(ii),precip_ave(ii))
  print(RC)

Variable: RC  <=== Conditioned on rain only times.
Type: float
Total Size: 4 bytes
            1 values
Number of Dimensions: 1
Dimensions and sizes:    [1]
Coordinates:
Number Of Attributes: 7
  _FillValue :    -999.99
  yintercept :    -6.783739
  yave :    0.6915334
  xave :    2005.787
  nptxy :    4859
  rstd :    0.003920668
  tval :    0.9505657
(0)    0.003726852






On Sun, Apr 5, 2015 at 9:04 AM, Mary Haley <haley at ucar.edu> wrote:

> Dave,
>
> Could you write the precip_ave variable to a NetCDF file and send us the
> file, so we can try trend_manken on it here?
>
> You can do something like:
>
> fout = addfile("test_trend_manken.nc","c")
> fout->pcp = precip_ave
>
> and then if the file is too large (say over 25 megabytes), then you can
> use our ftp to upload it. See
>
> http://www.ncl.ucar.edu/report_bug.shtml#HowToFTP
>
> Thanks,
>
> --Mary
>
>
> On Sat, Apr 4, 2015 at 12:56 PM, David Adams <dave.k.adams at gmail.com>
> wrote:
>
>> Hi all,
>> I am using the trend_manken
>> function to look at the significance of  precipitation intensity trends
>> (using TRMM data) for 15 years (every 3 hours data).  When I use 10,000
>> data points it takes about 3 seconds to run, 20,000 data points about 10
>> seconds, 25,000 about 15 seconds.  When I use the entire time series 29,737
>> data points, it never finishes.  I have waited more than 20 minutes and it
>> keeps spinning.  I switched from double to floating point, but that makes
>> no difference.
>>
>> ;---------------------------------
>> ; Mann Kendall test
>> ;---------------------------------
>> pa = trend_manken(precip_ave, False, 0)
>> print(pa)
>>
>>
>>
>> Any ideas?
>>
>> saludos,
>> Dave
>>
>> _______________________________________________
>> ncl-talk mailing list
>> List instructions, subscriber options, unsubscribe:
>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>
>>
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20150406/e6bf5b42/attachment.html 


More information about the ncl-talk mailing list