Issue
I have a clustering (Kmeans was applied and clusters were obtained). The radius for each cluster is being calculated between the center and the observations.
I don't understand this here [:, 0]
I know we're taking all observations from the first column but why not take the second column as well? What does [:, 0]
represent?
X_distances = euclidean_distances(X, [center])[:, 0]
radius = np.max(X_distances)
Solution
Whenever you come across things like this, unpack the code a bit and look at what each piece is giving you.
In this case, the docs for sklearn.metrics.pairwise.euclidean_distances
— which I assume you are using (please include this sort of information in your questions!) — also tell you what you need to know:
Returns: distances: ndarray of shape (n_samples_X, n_samples_Y)
So the function euclidean_distances(X, Y)
returns a 2D array of all the distances between the points in X
and the points in Y
. Your X
is all your data, and your Y
is just one point: the centroid of the cluster. Because Y
is only one point, your resulting distance matrix has only one column. Like this:
from sklearn.metrics import euclidean_distances
import numpy as np
X = np.array([[1, 3], [2, 5], [0, 4]])
euclidean_distances(X, [[0, 0]])
This gives:
array([[3.16227766],
[5.38516481],
[4. ]])
So the index [:, 0]
is getting this column. In fact, you could skip the indexing because np.max()
doesn't care: it's just going to give you the max of the entire array. So you could reduce your code to:
radius = euclidean_distances(X, [center]).max()
Answered By - kwinkunks
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.