[ncl-talk] A useful tidbit about kmean clustering

Dennis Shea shea at ucar.edu
Wed Jan 22 09:49:31 MST 2020


re: "If anyone knows better how to specify the number of clusters, I would
be glad to hear."

As noted in the *kmeans_as136*
<http://www.ncl.ucar.edu/Document/Functions/Built-in/kmeans_as136.shtml>
documentation, this algorithm requires that the user specify the
number of clusters a priori. NCL's examples are provided to illustrate that
the implementation
works correctly. However, NCL's focus is not statistics. Hence, the very
limited [one] cluster
function 'suite.'

How to do specify the appropriate number of cluster?
Apparently there is no definitive approach.To me, it seems 'trial and
error'  :-(

*https://towardsdatascience.com/clustering-analysis-in-r-using-k-means-73eca4fb7967*
<https://towardsdatascience.com/clustering-analysis-in-r-using-k-means-73eca4fb7967>
*https://www.statmethods.net/advstats/cluster.html*
<https://www.statmethods.net/advstats/cluster.html>
*https://www.r-bloggers.com/k-means-clustering-in-r/*
<https://www.r-bloggers.com/k-means-clustering-in-r/>
*https://www.datacamp.com/community/tutorials/k-means-clustering-r*
<https://www.datacamp.com/community/tutorials/k-means-clustering-r>
*https://www.guru99.com/r-k-means-clustering.html*
<https://www.guru99.com/r-k-means-clustering.html>

===
Also, keep in mind: Being in cluster 1 is not more significant than being
in cluster 'n'. It is arbitrary.
===
*r*e: 1) Set the number of clusters to 1 less than the number of points.
Otherwise, the program produces zeros or undefined numbers. or just strange
answers.

If 'k' [number of clusters] is the same as the number of points] then the
cluster variance is 0.0

>From the 1st link above:



*Choosing a good K*The bigger is the K you choose, the lower will be the
variance within the groups in the clustering. If K is equal to the number
of observations, then each point will be a group and the variance will be
0. It’s interesting to find a balance between the number of groups and
their variance. A variance of a group means how different the members of
the group are. The bigger is the variance, the bigger is the dissimilarity
in a group.

---
I don't think I can offer much more help.

Good luck
D


On Wed, Jan 22, 2020 at 8:24 AM Barry Lynn <barry.h.lynn at gmail.com> wrote:

> Just to be clear: I am providing this information/code for interested
> users.
>
> If anyone knows better how to specify the number of clusters, I would be
> glad to hear.
>
> Or, if anyone knows how to track their movement, as well.
>
> Thank you.
>
> On Wed, Jan 22, 2020 at 5:10 PM Barry Lynn <barry.h.lynn at gmail.com> wrote:
>
>> Hi:
>>
>> I had a couple of problems running the code.
>>
>> 1) It is not obvious how to set the k_clst (how many cluster), especially
>> when running from a script.
>>
>> 2) I really wanted to see the members of each cluster.
>>
>> Here, I do the following (in the Excerpt):
>>
>> 1) Set the number of clusters to 1 less than the number of points.
>> Otherwise, the program produces zeros or undefined numbers. or just strange
>> answers.
>> 2) I then reduce the number of clusters until I have just a  cluster with
>> 1 grid point, inclusive (or the minimum number of points).
>> 3) Because I didn't know how to modify the call to the clustering
>> program, I created my own from the fortran file, which outputs the members
>> of each cluster.
>> 4) I use WRAPIT it to call this program
>>
>> I have attached a self-contained fortran version of the code.
>>
>> Barry
>>
>>
>>
>> On Sun, Jan 19, 2020 at 5:30 PM Barry Lynn <barry.h.lynn at gmail.com>
>> wrote:
>>
>>> Hi:
>>>
>>> Thank you for asking.
>>>
>>> I found the FORTRAN code and may pass on my modifications if warranted.
>>>
>>> Thanks
>>>
>>> On Sun, 19 Jan 2020 at 17:26 Dennis Shea <shea at ucar.edu> wrote:
>>>
>>>> I am not sure of the question.
>>>> ---
>>>> You would have to access the changed code via a 'shared object'
>>>>
>>>> Does it require additional arguments? The NCL version of 'kmns' uses
>>>> double precision arguments.
>>>>
>>>> C NCLFORTSTART
>>>>       subroutine *kmns136* (dat, m, n, clcntr, k,  ic1, nc
>>>>      +                   ,iter, iseed, wss, ier)
>>>>       implicit none
>>>> c                            ! INPUT
>>>>       integer m, n, k, iter, iseed
>>>>       double precision dat(m,n)
>>>> c                            ! INPUT/OUTPUT
>>>>       integer ic1(m), nc(k), ier
>>>>       double precision clcntr(n,k), wss(k)
>>>> C NCLEND
>>>> c                            ! LOCAL WORK ARRAYS
>>>>       integer          ic2(m), ncp(k), itran(k), live(k)
>>>>       double precision an1(k), an2(k), d(m)
>>>>       integer nv, kk, mm
>>>> ====
>>>>
>>>> Change: * kmns136* to (say) kmns136x
>>>>
>>>> Then use WRAPIT to generate the shared object
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Jan 19, 2020 at 4:04 AM Barry Lynn via ncl-talk <
>>>> ncl-talk at ucar.edu> wrote:
>>>>
>>>>> Hello:
>>>>>
>>>>> I would like to use cluster analysis and it would be very helpful (I
>>>>> hope) if I could modify the source code.
>>>>>
>>>>> I downloaded the NCL source code from here:
>>>>>
>>>>> https://www.earthsystemgrid.org/dataset/ncl.640.src/file.html
>>>>>
>>>>> and found the Fortran (.f) programs in this directory.
>>>>>
>>>>> NCL_SOURCE/ncl_ncarg-6.4.0/ni/src/lib/nfpfort
>>>>>
>>>>> Barry
>>>>> --
>>>>> Barry H. Lynn, Ph.D
>>>>> Senior Associate Scientist, Lecturer,
>>>>> The Institute of the Earth Science,
>>>>> The Hebrew University of Jerusalem,
>>>>> Givat Ram, Jerusalem 91904, Israel
>>>>> Tel: 972 547 231 170
>>>>> Fax: (972)-25662581
>>>>>
>>>>> C.E.O, Weather It Is, LTD
>>>>> Weather and Climate Focus
>>>>> http://weather-it-is.com
>>>>> Jerusalem, Israel
>>>>> Local: 02 930 9525
>>>>> Cell: 054 7 231 170
>>>>> Int-IS: x972 2 930 9525
>>>>>
>>>>> _______________________________________________
>>>>> ncl-talk mailing list
>>>>> ncl-talk at ucar.edu
>>>>> List instructions, subscriber options, unsubscribe:
>>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>>
>>>> --
>>> Barry H. Lynn, Ph.D
>>> Senior Associate Scientist, Lecturer,
>>> The Institute of the Earth Science,
>>> The Hebrew University of Jerusalem,
>>> Givat Ram, Jerusalem 91904, Israel
>>> Tel: 972 547 231 170
>>> Fax: (972)-25662581
>>>
>>> C.E.O, Weather It Is, LTD
>>> Weather and Climate Focus
>>> http://weather-it-is.com
>>> Jerusalem, Israel
>>> Local: 02 930 9525
>>> Cell: 054 7 231 170
>>> Int-IS: x972 2 930 9525
>>>
>>>
>>
>> --
>> Barry H. Lynn, Ph.D
>> Senior Associate Scientist, Lecturer,
>> The Institute of the Earth Science,
>> The Hebrew University of Jerusalem,
>> Givat Ram, Jerusalem 91904, Israel
>> Tel: 972 547 231 170
>> Fax: (972)-25662581
>>
>> C.E.O, Weather It Is, LTD
>> Weather and Climate Focus
>> http://weather-it-is.com
>> Jerusalem, Israel
>> Local: 02 930 9525
>> Cell: 054 7 231 170
>> Int-IS: x972 2 930 9525
>>
>>
>
> --
> Barry H. Lynn, Ph.D
> Senior Associate Scientist, Lecturer,
> The Institute of the Earth Science,
> The Hebrew University of Jerusalem,
> Givat Ram, Jerusalem 91904, Israel
> Tel: 972 547 231 170
> Fax: (972)-25662581
>
> C.E.O, Weather It Is, LTD
> Weather and Climate Focus
> http://weather-it-is.com
> Jerusalem, Israel
> Local: 02 930 9525
> Cell: 054 7 231 170
> Int-IS: x972 2 930 9525
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20200122/6b0e5c1a/attachment.html>


More information about the ncl-talk mailing list