Issue
I'm using sklearn.cluster.AgglomerativeClustering
. It begins with one cluster per data point and iteratively merges together the two "closest" clusters, thus forming a binary tree. What constitutes distance between clusters depends on a linkage parameter.
It would be useful to know the distance between the merged clusters at each step. We could then stop when the next to be merged clusters get too far apart. Alas, that does not seem to be available in AgglomerativeClustering
.
Am I missing something? Is there a way to recover the distances?
Solution
You might want to take a look at scipy.cluster.hierarchy
which offers somewhat more options than sklearn.cluster.AgglomerativeClustering
.
The clustering is done with the linkage
function which returns a matrix containing the distances between the merged clusters. These can be visualised with a dendrogram:
from scipy.cluster.hierarchy import linkage, fcluster, dendrogram
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
X, cl = make_blobs(n_samples=20, n_features=2, centers=3, cluster_std=0.5, random_state=0)
Z = linkage(X, method='ward')
plt.figure()
dendrogram(Z)
plt.show()
One can form flat clusters from the linkage matrix based on various criteria, e.g. the distance of observations:
clusters = fcluster(Z, 5, criterion='distance')
Scipy's hierarchical clustering is discussed in much more detail here.
Answered By - σηγ
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.