Tuesday, October 19, 2021

[FIXED] How can i generate three outlier points such that they are apparently far away from the normal data in python?

October 19, 2021 dataset, outliers, python, scikit-learn No comments

Issue

I am using make_moons dataset and I am trying to implement an outlier detection algorithm. That's why I want to generate 3 points which are away from normal data, and testify if they are outlier or not. These 3 points should be randomly selected from my data and should be far as possible from the normal data. My algorithm will compare the distance between that point with theresold value and finds if it is an outlier or not. I am aware of the other resources to do that, but my specific problem to do that, is my dataset. I could not find a way to fit the solutions to my dataset

Here is my code to define dataset and fit into K-Means(I have to use K-Means fitted data):

data = make_moons(n_samples=100,noise=0, random_state=0)
X,y=data
n_clusters=10
kmeans = KMeans(n_clusters = n_clusters,random_state=10)
kmeans.fit(X)
centroids = kmeans.cluster_centers_
labels = kmeans.labels_

Shortly, how can i find farthest 3 points in my data, to use it in outlier detection?

Solution

As stated in the comments, you should define a criteria to classify outliers. Either way, in the following code, I randomly selected three entries from X and multiplied them by 1,000, so surely that should make them outliers regardless of the definition you choose.

# Import libraries
import numpy as np
from sklearn.datasets import make_moons

# Create data
X, y = make_moons(100, random_state=123)

# Randomly select 3 row numbers from X
np.random.seed(5)
idx = np.random.randint(low=0, high=len(df[0]) + 1, size=3)

# Overwrite the data from the randomly selected rows
for i in idx:
    scaler = 1000 # Change this number to whatever you need
    X[i] = X[i] * scaler

Note: There is a small probability that idx will have duplicates. It won't happen with np.random.seed(5), but if you choose another seed (or opt to not use one at all) and get duplicates, simply try another one or repeat until you don't get duplicates.

Answered By - Arturo Sbr

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, October 19, 2021

[FIXED] How can i generate three outlier points such that they are apparently far away from the normal data in python?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels