Issue
I have a dataset that looks similar to this:
|Budget|Profit |Ranking| #Ranking has values ranging from 1-10 (1 = worst, 10 = best)
|------|-------|-------|
|10000 | NaN | 8 |
|NaN | 4000 | 3 |
|5000 | 7000 | 9 |
|12000 | NaN | 8 |
|2000 | 4500 | 3 |
|5000 | 10000 | 8 |
I want to fill the missing values in Budget and Profit in regards to the median of their corresponding ranking, but can't seem to get it right.
for score in range(1, 11):
median_score = df[df['score'] == score]['budget'].median()
index_nr = df[df['score'] == score]
for i in index_nr:
df.loc[i, :]['budget'].filla(median_score)
How can I improve my code so that it fills the missing data with the median of it's rank?
Solution
First get the columns' median of each ranking group using groupby
+ transform
, and then pass it to fillna
groups_median = df.groupby('Ranking').transform('median')
df = df.fillna(groups_median)
Answered By - HarryPlotter
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.