Issue
I have a pandas dataframe with some values
#| X | Y | Value1 | Value2 |
---------------------------
1| 2 | 4 | 10 | 3 |
2| 2 | 4 | 3 | 2 |
3| 2 | 4 | 1 | 4 |
4| 4 | 5 | 5 | 20 |
5| 4 | 5 | 3 | 2 |
6| 5 | 6 | 1 | 2 |
7| 4 | 5 | 4 | 3 |
----------------------------
The goal is to impute values for similar groups ( based on X, Y values) for columns Value1 and Value2.
e.g. X=2,Y=4 has values 1, 3, 10 for Value 1. Using median imputation I would like to replace the 10 by 3, as 3 is the median. Similarly, for X=4, Y=5 the values are 2, 3, 20 for Value2 column. I would like to replace the 20 with the median 3.
Note: 10 and 20 are being treated as outliers here.
Solution
Using the following answer from n1k31t4 in: https://datascience.stackexchange.com/questions/37717/imputation-missing-values-other-than-using-mean-median-in-python I was able to solve my problem.
df[col]=df.groupby(['X', 'Y'])[col].transform(lambda x: x.median() if (np.abs(x)>3).any() else x)
Answered By - Abrar Hasin
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.