Issue
there are tons of questions on removing outliers already but I couldn't solve my problem with them.
I want to remove rows with outliers from the dataframe
.
Say, I have following dataframe
:
0 1 2 3 4 5 6 7
a 1 2 3 4 100 2 1 3
b 2 1 3 4 1 2 300 123
c 100 200 300 400 200 500 200 400
For row a
we can assume that 100 is an outlier, so I want to remove a
.
Even though all the values in Row c
are high, they are not the outliers for the row itself, so, I want to keep it.
So, basically I want to remove all the rows with outliers.
I tried transposing the DF and did something like
df = df[(np.abs(stats.zscore(df)) < 2).all(axis=1)]
, but it didn't work
Solution
Add axis=1
to zscore
:
df = df[(np.abs(stats.zscore(df, axis=1)) < 2).all(axis=1)]
print (df)
0 1 2 3 4 5 6 7
c 100 200 300 400 200 500 200 400
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.