Issue
I would like to use one of sklearn's clustering algorithms but with the restriction that certain sets of points must belong to the same class. For instance, given the set of points below I would like to enforce that all red points belong to the same class and all blue points belong to the same class. I would also like it so that red and blue points can belong to the same class. If this is not possible in sklearn I am also open to using other libraries.
Solution
The name for this is "constrained clustering," which is a family of semi-supervised clustering approaches in which a user can also supply constraints as:
- Must Link - two nodes must belong to the same cluster
- Cannot Link - two nodes cannot belong to the same cluster
There's an implementation of the COP-KMeans algorithm, which provides an API like this:
import numpy
from copkmeans.cop_kmeans import cop_kmeans
input_matrix = numpy.random.rand(100, 500)
must_link = [(0, 10), (0, 20), (0, 30)]
cannot_link = [(1, 10), (2, 10), (3, 10)]
clusters, centers = cop_kmeans(dataset=input_matrix, k=5, ml=must_link,cl=cannot_link)
Answered By - Alexander L. Hayes
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.