Issue
For example in the following dataframe the column 'b' is calculated based on the last time column 'a' was True:
a b
0 True 0
1 False 1
2 True 0
3 False 1
4 False 2
5 False 3
Currently I use the code below to make this work. But the problem is because I'm using a loop, the code is very slow.
a=np.where(cond)[-1]
b=np.array([],dtype=np.int64)
s=0
for i in range(0,len(data)):
if i in a:
b=np.append(b,0)
s=0
else:
b=np.append(b,s)
s+=1
data['b']=pd.Series(b).fillna(method='ffill').fillna(-1)
Is there a faster way to do this without using a for loop?
Solution
IIUC, you can use groupby_cumcount
:
df['b'] = df.groupby(df['a'].cumsum()).cumcount()
print(df)
# Output
a b
0 True 0
1 False 1
2 True 0
3 False 1
4 False 2
5 False 3
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.