[ncl-talk] A useful tidbit about kmean clustering

Barry Lynn barry.h.lynn at gmail.com
Thu Jan 23 06:18:54 MST 2020


Hi Dennis:

This program (attached) first reduces the number of clusters, such that
there remains one cluster with 1 point or the minimum number of points (in
theory). One could try it on the data I provided in the last iteration in
the test_clustering.F program.

I then do a loop where I require that the minimum distance between clusters
is some arbitrary value. I think it works. Hence, I reduce the number of
clusters until they are "separate units" within.

Lastly, I call my version of the program in order to output the cluster
members (for later use).  This could be added to the ncl version, pretty
easily.

What you wrote it true, but this code (full of print statements, etc) seems
to the process that one would have to do by trial and error.

Barry

On Wed, Jan 22, 2020 at 6:49 PM Dennis Shea <shea at ucar.edu> wrote:

> re: "If anyone knows better how to specify the number of clusters, I would
> be glad to hear."
>
> As noted in the *kmeans_as136*
> <http://www.ncl.ucar.edu/Document/Functions/Built-in/kmeans_as136.shtml>
> documentation, this algorithm requires that the user specify the
> number of clusters a priori. NCL's examples are provided to illustrate
> that the implementation
> works correctly. However, NCL's focus is not statistics. Hence, the very
> limited [one] cluster
> function 'suite.'
>
> How to do specify the appropriate number of cluster?
> Apparently there is no definitive approach.To me, it seems 'trial and
> error'  :-(
>
>
> *https://towardsdatascience.com/clustering-analysis-in-r-using-k-means-73eca4fb7967*
> <https://towardsdatascience.com/clustering-analysis-in-r-using-k-means-73eca4fb7967>
> *https://www.statmethods.net/advstats/cluster.html*
> <https://www.statmethods.net/advstats/cluster.html>
> *https://www.r-bloggers.com/k-means-clustering-in-r/*
> <https://www.r-bloggers.com/k-means-clustering-in-r/>
> *https://www.datacamp.com/community/tutorials/k-means-clustering-r*
> <https://www.datacamp.com/community/tutorials/k-means-clustering-r>
> *https://www.guru99.com/r-k-means-clustering.html*
> <https://www.guru99.com/r-k-means-clustering.html>
>
> ===
> Also, keep in mind: Being in cluster 1 is not more significant than being
> in cluster 'n'. It is arbitrary.
> ===
> *r*e: 1) Set the number of clusters to 1 less than the number of points.
> Otherwise, the program produces zeros or undefined numbers. or just strange
> answers.
>
> If 'k' [number of clusters] is the same as the number of points] then the
> cluster variance is 0.0
>
> From the 1st link above:
>
>
>
> *Choosing a good K*The bigger is the K you choose, the lower will be the
> variance within the groups in the clustering. If K is equal to the number
> of observations, then each point will be a group and the variance will be
> 0. It’s interesting to find a balance between the number of groups and
> their variance. A variance of a group means how different the members of
> the group are. The bigger is the variance, the bigger is the dissimilarity
> in a group.
>
> ---
> I don't think I can offer much more help.
>
> Good luck
> D
>
>
> On Wed, Jan 22, 2020 at 8:24 AM Barry Lynn <barry.h.lynn at gmail.com> wrote:
>
>> Just to be clear: I am providing this information/code for interested
>> users.
>>
>> If anyone knows better how to specify the number of clusters, I would be
>> glad to hear.
>>
>> Or, if anyone knows how to track their movement, as well.
>>
>> Thank you.
>>
>> On Wed, Jan 22, 2020 at 5:10 PM Barry Lynn <barry.h.lynn at gmail.com>
>> wrote:
>>
>>> Hi:
>>>
>>> I had a couple of problems running the code.
>>>
>>> 1) It is not obvious how to set the k_clst (how many cluster),
>>> especially when running from a script.
>>>
>>> 2) I really wanted to see the members of each cluster.
>>>
>>> Here, I do the following (in the Excerpt):
>>>
>>> 1) Set the number of clusters to 1 less than the number of points.
>>> Otherwise, the program produces zeros or undefined numbers. or just strange
>>> answers.
>>> 2) I then reduce the number of clusters until I have just a  cluster
>>> with 1 grid point, inclusive (or the minimum number of points).
>>> 3) Because I didn't know how to modify the call to the clustering
>>> program, I created my own from the fortran file, which outputs the members
>>> of each cluster.
>>> 4) I use WRAPIT it to call this program
>>>
>>> I have attached a self-contained fortran version of the code.
>>>
>>> Barry
>>>
>>>
>>>
>>> On Sun, Jan 19, 2020 at 5:30 PM Barry Lynn <barry.h.lynn at gmail.com>
>>> wrote:
>>>
>>>> Hi:
>>>>
>>>> Thank you for asking.
>>>>
>>>> I found the FORTRAN code and may pass on my modifications if warranted.
>>>>
>>>> Thanks
>>>>
>>>> On Sun, 19 Jan 2020 at 17:26 Dennis Shea <shea at ucar.edu> wrote:
>>>>
>>>>> I am not sure of the question.
>>>>> ---
>>>>> You would have to access the changed code via a 'shared object'
>>>>>
>>>>> Does it require additional arguments? The NCL version of 'kmns' uses
>>>>> double precision arguments.
>>>>>
>>>>> C NCLFORTSTART
>>>>>       subroutine *kmns136* (dat, m, n, clcntr, k,  ic1, nc
>>>>>      +                   ,iter, iseed, wss, ier)
>>>>>       implicit none
>>>>> c                            ! INPUT
>>>>>       integer m, n, k, iter, iseed
>>>>>       double precision dat(m,n)
>>>>> c                            ! INPUT/OUTPUT
>>>>>       integer ic1(m), nc(k), ier
>>>>>       double precision clcntr(n,k), wss(k)
>>>>> C NCLEND
>>>>> c                            ! LOCAL WORK ARRAYS
>>>>>       integer          ic2(m), ncp(k), itran(k), live(k)
>>>>>       double precision an1(k), an2(k), d(m)
>>>>>       integer nv, kk, mm
>>>>> ====
>>>>>
>>>>> Change: * kmns136* to (say) kmns136x
>>>>>
>>>>> Then use WRAPIT to generate the shared object
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Jan 19, 2020 at 4:04 AM Barry Lynn via ncl-talk <
>>>>> ncl-talk at ucar.edu> wrote:
>>>>>
>>>>>> Hello:
>>>>>>
>>>>>> I would like to use cluster analysis and it would be very helpful (I
>>>>>> hope) if I could modify the source code.
>>>>>>
>>>>>> I downloaded the NCL source code from here:
>>>>>>
>>>>>> https://www.earthsystemgrid.org/dataset/ncl.640.src/file.html
>>>>>>
>>>>>> and found the Fortran (.f) programs in this directory.
>>>>>>
>>>>>> NCL_SOURCE/ncl_ncarg-6.4.0/ni/src/lib/nfpfort
>>>>>>
>>>>>> Barry
>>>>>> --
>>>>>> Barry H. Lynn, Ph.D
>>>>>> Senior Associate Scientist, Lecturer,
>>>>>> The Institute of the Earth Science,
>>>>>> The Hebrew University of Jerusalem,
>>>>>> Givat Ram, Jerusalem 91904, Israel
>>>>>> Tel: 972 547 231 170
>>>>>> Fax: (972)-25662581
>>>>>>
>>>>>> C.E.O, Weather It Is, LTD
>>>>>> Weather and Climate Focus
>>>>>> http://weather-it-is.com
>>>>>> Jerusalem, Israel
>>>>>> Local: 02 930 9525
>>>>>> Cell: 054 7 231 170
>>>>>> Int-IS: x972 2 930 9525
>>>>>>
>>>>>> _______________________________________________
>>>>>> ncl-talk mailing list
>>>>>> ncl-talk at ucar.edu
>>>>>> List instructions, subscriber options, unsubscribe:
>>>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>>>>
>>>>> --
>>>> Barry H. Lynn, Ph.D
>>>> Senior Associate Scientist, Lecturer,
>>>> The Institute of the Earth Science,
>>>> The Hebrew University of Jerusalem,
>>>> Givat Ram, Jerusalem 91904, Israel
>>>> Tel: 972 547 231 170
>>>> Fax: (972)-25662581
>>>>
>>>> C.E.O, Weather It Is, LTD
>>>> Weather and Climate Focus
>>>> http://weather-it-is.com
>>>> Jerusalem, Israel
>>>> Local: 02 930 9525
>>>> Cell: 054 7 231 170
>>>> Int-IS: x972 2 930 9525
>>>>
>>>>
>>>
>>> --
>>> Barry H. Lynn, Ph.D
>>> Senior Associate Scientist, Lecturer,
>>> The Institute of the Earth Science,
>>> The Hebrew University of Jerusalem,
>>> Givat Ram, Jerusalem 91904, Israel
>>> Tel: 972 547 231 170
>>> Fax: (972)-25662581
>>>
>>> C.E.O, Weather It Is, LTD
>>> Weather and Climate Focus
>>> http://weather-it-is.com
>>> Jerusalem, Israel
>>> Local: 02 930 9525
>>> Cell: 054 7 231 170
>>> Int-IS: x972 2 930 9525
>>>
>>>
>>
>> --
>> Barry H. Lynn, Ph.D
>> Senior Associate Scientist, Lecturer,
>> The Institute of the Earth Science,
>> The Hebrew University of Jerusalem,
>> Givat Ram, Jerusalem 91904, Israel
>> Tel: 972 547 231 170
>> Fax: (972)-25662581
>>
>> C.E.O, Weather It Is, LTD
>> Weather and Climate Focus
>> http://weather-it-is.com
>> Jerusalem, Israel
>> Local: 02 930 9525
>> Cell: 054 7 231 170
>> Int-IS: x972 2 930 9525
>>
>>

-- 
Barry H. Lynn, Ph.D
Senior Associate Scientist, Lecturer,
The Institute of the Earth Science,
The Hebrew University of Jerusalem,
Givat Ram, Jerusalem 91904, Israel
Tel: 972 547 231 170
Fax: (972)-25662581

C.E.O, Weather It Is, LTD
Weather and Climate Focus
http://weather-it-is.com
Jerusalem, Israel
Local: 02 930 9525
Cell: 054 7 231 170
Int-IS: x972 2 930 9525
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20200123/43592a13/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster_find.ncl
Type: application/octet-stream
Size: 4357 bytes
Desc: not available
URL: <http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20200123/43592a13/attachment.obj>


More information about the ncl-talk mailing list