Issue
I was learning K-means clustering. And is quite confused about the working of plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
what is the purpose of X[y_kmeans == 0, 0], X[y_kmeans == 0, 1]
in the code?
Full code here
#k-means
#importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#importing the dataset
dataset = pd.read_csv("mall_customers.csv")
X = dataset.iloc[:,[3,4]].values
#using the elbow method to find the optimal number of clusters
from sklearn.cluster import KMeans
wcss = [] #Within-Cluster Sum of Square
for i in range(1,11):
kmeans = KMeans(n_clusters = i, init = 'k-means++',max_iter = 300,n_init=10,random_state = 0)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1,11),wcss)
plt.title("The elbow method")
plt.xlabel("Number of cluster")
plt.ylabel('Wcss')
plt.show()
#applying kmeans to all dataset
kmeans = KMeans(n_clusters = 5,init = 'k-means++', max_iter=300,n_init=10,random_state=0)
y_kmeans = kmeans.fit_predict(X)
#Visualising the cluster
plt.scatter(X[y_kmeans == 0,0],X[y_kmeans == 0,1],s=100,c = 'red' ,label='Cluster1')
plt.scatter(X[y_kmeans == 1,0],X[y_kmeans == 1,1],s=100,c='blue', label='Cluster2')
plt.scatter(X[y_kmeans == 2,0],X[y_kmeans == 2,1],s=100,c='green',label='Cluster3')
plt.scatter(X[y_kmeans == 3,0],X[y_kmeans == 3,1],s=100, c ='cyan',label = 'CLuster4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],s=300, c = 'yellow', label ='Centroids')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
I have added the output image for reference purpose elbow graph, Final cluster image
Solution
That's a filter. y_kmeans == 0
selects those elements where y_kmeans[i]
is equal to 0. X[y_kmeans == 0, 0]
selects the elements of X where the corresponding y_kmeans
value is 0 and the second dimension is 0.
Originally answered by tim roberts
X[y_hc ==1,0]
here 0 means model is in x plain X[y_hc == 0,1]
means model is in y-plain.
Where as 1 refers to the value of [i]
or the cluster value.
Answered By - user10064176
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.