Issue
last_valid_index()
only applies to the entire dataframe and rolling()
does not allow last_valid_index()
. Is there a way to find the last valid index in a column of booleans in a window?
For instance:
d = {'col': [True, False, True, True, False, False]}
df = pd.DataFrame(data=d)
The expected outcome for a rolling window of 3 is:
0 NaN
1 NaN
2 2.0
3 3.0
4 3.0
5 3.0
Solution
As I mentioned in a comment here, I think the current accepted solution has a bug. A lot of the beginning of this post is taken word-for-word from my comment there.
Change the example to be
d = {'col': [True, False, True, True, False, False, False]}
df = pd.DataFrame(data=d)
Then the last 3 entries compose the entire rolling window of 3, and all are False. But the current accepted solution returns index 3 for the last entry, even though I am assuming it should be NaN (otherwise what's the point of having the rolling window at all, other than to set the first 2 observations as NaN?).
Here is my proposed fix:
df['new'] = df.index
df['new'] = df['new'].where(df['col'], -1).rolling(3).max().replace(-1, np.nan)
What it does is instead of replacing values where df['col']
is False with NaNs, then using ffill()
to replace those indices with the previous index, it replaces those indices with -1. Then at the end, if all the indices in a window have value -1, it means the entire window has df['col']
as False, so that index is replaced with np.nan
.
Answered By - Adam Oppenheimer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.