Issue
I am doing a k-means clustering and I want to make sure that the labels are matched to the correct cluster number. Below is the code I used
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances_argmin_min
dataset = pd.read_csv('ratio.csv', index_col=0).T
dataset_copy = dataset
dataset_copy = dataset_copy.dropna()
X = dataset_copy.iloc[:, [0, 1, 2, 3]].values
kmeans = KMeans(n_clusters=4, init='k-means++', random_state=42)
y_kmeans = kmeans.fit_predict(X)
# From here
Company = pd.DataFrame(dataset_copy.index)
cluster_labels = pd.DataFrame(kmeans.labels_)
labels_df = pd.concat([Company, cluster_labels],axis = 1)
Does the code from the # From here assign the correct label to the cluster number?
Part of the dataset that I am using in the code:
Solution
Yes, the code snippet you provided from the "# From here" comment does correctly assign the cluster labels to each company.
The relevant lines of code:
Company = pd.DataFrame(dataset_copy.index)
: This line creates a DataFrame
from the indices of dataset_copy
, which presumably are the company names or identifiers.
cluster_labels = pd.DataFrame(kmeans.labels_)
: Here, you are converting the labels assigned by the k-means algorithm to a DataFrame
. The kmeans.labels_
array contains the cluster number assigned to each sample in X
.
labels_df = pd.concat([Company, cluster_labels], axis=1)
: This line concatenates the company names and their corresponding cluster labels along the columns (axis=1
). This results in a new DataFrame
labels_df
, where each row contains a company name and its associated cluster label.
Thus, every company in labels_df
is matched with the cluster number assigned by the k-means algorithm.
However, there is a more elegant way. Both dataset_copy.index
and kmeans.labels_
are 1D arrays, so you could create the DataFrame
for mapping in a more elegant way:
labels_df = pd.DataFrame({'Company': dataset_copy.index, 'Cluster': kmeans.labels_})
Answered By - DataJanitor
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.