Friday, October 8, 2021

[FIXED] Best method to cluster coordinates around set centroids (Improving Scikit K-Means output? Naive methods?)

October 08, 2021 coordinates, k-means, numpy, python, scikit-learn No comments

Issue

So basically I have two lists of coordinates, one with "home" points (centroids essentially) and one with "destination" points. I want to cluster these "destination" coordinates to the closest "home" points (as if the "home" points are centroids). Below is an example of what I want:

Input:
[home_coords_1, home_coords_2, home_coords_3]
[destination_coords_1, destination_coords_2, destination_coords_3, destination_coords_4, destination_coords_5]

Output:
[[home_coords_1, destination_coords_2, destination_coords_5],[home_coords_2, destination_coords_4], [home_coords_3, destination_coords_1, destination_coords_3]]
given that the "destination" coordinates are in close proximity to the "home" coordinate in its sub-array

I have already accomplished this with the K-Means clustering function in the scikit python package by passing the home coordinates as the initial centroids. However I noticed that there are some imperfections in the clustering. Also it seems as if this is almost an improper use of K-Means clustering as there is only one iteration happening (see the line of code below).

km = KMeans(n_clusters=len(home_coords_list), n_init= 1, init= home_coords).fit(destination_coords)

This brings me to my question: What is the best way to cluster a list of coordinates around a pre-set list of coordinates. An alternative I am thinking about is just running through the list of "home" coordinates and one by one picking n closest "destination" coordinates. This seems a lot more naive though. Any thoughts or suggestions? Any help is appreciated! Thank you!

Solution

You can use e.g. scipy.spatial.KDTree.

from scipy.spatial import KDTree
import numpy as np

# sample arrays with home and destination coordinates
np.random.seed(0)
home = np.random.rand(10, 2)
destination = np.random.rand(50, 2)

kd_tree = KDTree(home)
labels = kd_tree.query(destination)[1]
print(labels)

This will give an array that for each destination point gives the index of the closest home point:

[9 0 8 8 1 2 2 8 1 5 2 4 0 7 2 1 4 7 1 1 7 4 7 4 4 4 5 4 7 7 2 8 1 7 6 2 8
 7 7 4 5 9 2 1 3 3 5 5 5 5]

Then for any given home point, you can find coordinates of all destination points clustered with that point:

# destination points clustered with `home[0]`
destination[labels == 0]

It gives:

array([[0.46147936, 0.78052918],
       [0.66676672, 0.67063787]])

Answered By - bb1

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, October 8, 2021

[FIXED] Best method to cluster coordinates around set centroids (Improving Scikit K-Means output? Naive methods?)

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels