Issue
df = pd.DataFrame(np.random.randint(0,2,size=(5,3)), columns=list('ABC'))
print (df)
I would like to create a forth column "D" which will take a value of 1 if:
- (at least) two column (A, B, C) have a value of 1 or
- the previous 2 periods had at least two columns with a value of 1.
According to the example above all the rows would have df['D']==1
Solution
We can look for the 3-window rolling sum of a boolean series that marks at least 2 ones per row, and check if the result is 0 (so that D
will be too) or not:
df["D"] = df.sum(axis=1).ge(2).rolling(3, min_periods=1).sum().ne(0).astype(int)
samples:
>>> df1
A B C
0 0 0 1
1 0 1 0
2 1 1 1
3 0 1 0
4 1 1 1
5 1 0 0
6 1 0 0
7 0 0 1
8 1 0 1
9 1 0 0
>>> # after..
A B C D
0 0 0 1 0
1 0 1 0 0
2 1 1 1 1
3 0 1 0 1
4 1 1 1 1
5 1 0 0 1
6 1 0 0 1
7 0 0 1 0
8 1 0 1 1
9 1 0 0 1
>>> df2
A B C
0 0 0 0
1 0 1 0
2 0 0 0
3 1 1 0
4 0 1 1
5 1 0 0
6 0 0 0
7 1 1 1
8 0 1 1
9 0 0 0
>>> # after...
A B C D
0 0 0 0 0
1 0 1 0 0
2 0 0 0 0
3 1 1 0 1
4 0 1 1 1
5 1 0 0 1
6 0 0 0 1
7 1 1 1 1
8 0 1 1 1
9 0 0 0 1
Answered By - Mustafa Aydın
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.