Issue
I have a pandas DataFrame in which I need to populate the NaN
values by 0
starting from the first non-nan numeric value in each row.
It is close to this post, but doing the operation per row and not per column: How to populate NaN by 0, starting after first non-nan value
Example df
name day1 day2 day3 day4
0 Alice NaN 5.0 NaN 6.0
1 Bob NaN NaN 3.0 2.0
2 Carol 3.0 NaN NaN 4.0
Desired output
name day1 day2 day3 day4
0 Alice NaN 5.0 0.0 6.0
1 Bob NaN NaN 3.0 2.0
2 Carol 3.0 0.0 0.0 4.0
I tried to modify the method from linked question, but also some workaround this post, unfortunately didn't managed to get through.
Thanks!
Solution
You can craft a boolean mask with notna
+cummax
:
cols = list(df.filter(like='day'))
# or
# cols = ['day1', 'day2', 'day3', 'day4']
df[cols] = df[cols].fillna(0).where(df[cols].notna().cummax(axis=1))
Output:
name day1 day2 day3 day4
0 Alice NaN 5.0 0.0 6.0
1 Bob NaN NaN 3.0 2.0
2 Carol 3.0 0.0 0.0 4.0
Intermediate:
# df[cols].notna().cummax(axis=1)
day1 day2 day3 day4
0 False True True True
1 False False True True
2 True True True True
Or adapting the linked approach be used with rows and ignoring the non-day column (which I don't like as much as the above):
df[cols] = df[cols].fillna(df[cols].mask(df[cols].ffill(axis=1).notna(), 0), axis=1)
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.