Issue
If I want to remove values that do not exist between -2σ and 2σ, how do I remove outliers using iqr?
I implemented this equation as follows.
iqr = df['abc'].percentile(0.75) - df['abc'].percentile(0.25)
cond1 = (df['abc'] > df['abc'].percentile(0.75) + 2 * iqr)
cond2 = (df['abc'] < df['abc'].percentile(0.25) - 2 * iqr)
df[cond1 & cond2]
Is this the right way?
Solution
This is not right. Your iqr
is almost never equal to σ. Quartiles and deviations are not the same things.
Fortunately, you can easily compute the standard deviation of a numerical Series using Series.std()
.
sigma = df['abc'].std()
cond1 = (df['abc'] > df['abc'].mean() - 2 * sigma)
cond2 = (df['abc'] < df['abc'].mean() + 2 * sigma)
df[cond1 & cond2]
Answered By - Benjamin Rio
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.