Issue
I want to apply sklearn graph clustering algorithms but they don't accept input from networkx in .gexf format. What kind of library/transformations do I need to turn my .gexf graphs suitable for sklearn?
Solution
Cluster algorithms accept either distance matrices, affinity matrices, or feature matrices. For example, kmeans would accept a feature matrix (say X of n points of m dimensions) and apply the Euclidean distance metric, while affinity propagation accepts an affinity matrix (i.e. a square matrix D of nxn dimensions) or a feature matrix (depending on the affinity parameter).
If you want to apply a sklearn (or just non-graph) cluster algorithm, you can extract adjacency matrices from networkx graphs.
A = nx.to_scipy_sparse_matrix(G)
I guess you should make sure, your diagonal is 1; do numpy.fill_diagonal(D, 1)
if not.
This then leaves only applying the clustering algorithm:
from sklearn.cluster import AffinityPropagation
ap = AffinityPropagation(affinity='precomputed').fit(A)
print(ap.labels_)
You can also convert your adjacency matrix to a distance matrix if you want to apply other algorithms or even project your adjacency/distance matrix to a feature matrix.
To go through all of this would go too far, however, as for getting the distance matrix, if you have binary edges, you can do D = 1 - A
; if you have weighted edges you could D = A.max() - A
.
Answered By - ben26941
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.