Issue
So basically I have two lists of coordinates, one with "home" points (centroids essentially) and one with "destination" points. I want to cluster these "destination" coordinates to the closest "home" points (as if the "home" points are centroids). Below is an example of what I want:
Input:
[home_coords_1, home_coords_2, home_coords_3]
[destination_coords_1, destination_coords_2, destination_coords_3, destination_coords_4, destination_coords_5]
Output:
[[home_coords_1, destination_coords_2, destination_coords_5],[home_coords_2, destination_coords_4], [home_coords_3, destination_coords_1, destination_coords_3]]
given that the "destination" coordinates are in close proximity to the "home" coordinate in its sub-array
I have already accomplished this with the K-Means clustering function in the scikit python package by passing the home coordinates as the initial centroids. However I noticed that there are some imperfections in the clustering. Also it seems as if this is almost an improper use of K-Means clustering as there is only one iteration happening (see the line of code below).
km = KMeans(n_clusters=len(home_coords_list), n_init= 1, init= home_coords).fit(destination_coords)
This brings me to my question: What is the best way to cluster a list of coordinates around a pre-set list of coordinates. An alternative I am thinking about is just running through the list of "home" coordinates and one by one picking n closest "destination" coordinates. This seems a lot more naive though. Any thoughts or suggestions? Any help is appreciated! Thank you!
Solution
You can use e.g. scipy.spatial.KDTree
.
from scipy.spatial import KDTree
import numpy as np
# sample arrays with home and destination coordinates
np.random.seed(0)
home = np.random.rand(10, 2)
destination = np.random.rand(50, 2)
kd_tree = KDTree(home)
labels = kd_tree.query(destination)[1]
print(labels)
This will give an array that for each destination
point gives the index of the closest home
point:
[9 0 8 8 1 2 2 8 1 5 2 4 0 7 2 1 4 7 1 1 7 4 7 4 4 4 5 4 7 7 2 8 1 7 6 2 8
7 7 4 5 9 2 1 3 3 5 5 5 5]
Then for any given home
point, you can find coordinates of all destination
points clustered with that point:
# destination points clustered with `home[0]`
destination[labels == 0]
It gives:
array([[0.46147936, 0.78052918],
[0.66676672, 0.67063787]])
Answered By - bb1
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.