Issue
I am using make_moons dataset and I am trying to implement an outlier detection algorithm. That's why I want to generate 3 points which are away from normal data, and testify if they are outlier or not. These 3 points should be randomly selected from my data and should be far as possible from the normal data. My algorithm will compare the distance between that point with theresold value and finds if it is an outlier or not. I am aware of the other resources to do that, but my specific problem to do that, is my dataset. I could not find a way to fit the solutions to my dataset
Here is my code to define dataset and fit into K-Means(I have to use K-Means fitted data):
data = make_moons(n_samples=100,noise=0, random_state=0)
X,y=data
n_clusters=10
kmeans = KMeans(n_clusters = n_clusters,random_state=10)
kmeans.fit(X)
centroids = kmeans.cluster_centers_
labels = kmeans.labels_
Shortly, how can i find farthest 3 points in my data, to use it in outlier detection?
Solution
As stated in the comments, you should define a criteria to classify outliers. Either way, in the following code, I randomly selected three entries from X
and multiplied them by 1,000, so surely that should make them outliers regardless of the definition you choose.
# Import libraries
import numpy as np
from sklearn.datasets import make_moons
# Create data
X, y = make_moons(100, random_state=123)
# Randomly select 3 row numbers from X
np.random.seed(5)
idx = np.random.randint(low=0, high=len(df[0]) + 1, size=3)
# Overwrite the data from the randomly selected rows
for i in idx:
scaler = 1000 # Change this number to whatever you need
X[i] = X[i] * scaler
Note: There is a small probability that idx
will have duplicates. It won't happen with np.random.seed(5)
, but if you choose another seed (or opt to not use one at all) and get duplicates, simply try another one or repeat until you don't get duplicates.
Answered By - Arturo Sbr
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.