Issue
data = pd.DataFrame(
{"categ": [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1],
"value": [0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1],
"Run_count": [0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 4, 4, 0, 0, 0, 0, 5, 5, 0, 0, 6],
"currentResults": [0, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 0, 0, 5, 5, 5, 5, 6, 6, 6],
"desiredResults": [0, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 0, 0, 0, 0, 1, 1, 2, 2, 2],
})
data['Run_count'] = data['value'].where(data['value'].eq(1)).dropna().groupby(data['value'].where(data['value'].eq(1)).isna().cumsum()).ngroup()+1
data['currentResults'] = np.where(data['categ']==1,data['Run_count'].bfill(),0)
I am able to accomplish most of what I am trying to do aka what is in data['currentResults']
what I am trying to accomplish is at id=15 where df['categ'] == 1, I would like to restart the count again in data['desiredCount']. Since df['categ'][13:14], they have the value of 0. Currently at id = 15, data['currentResults'] continues the count at "5", where I want to restart/reset the count at 1
I was thinking the easiest way to do accomplish might be to reset/restart the counter in the "Run_count" column based on my current code.
attached is an image below of what i'm trying to accomlish in "desiredResults"
Solution
If I understand correctly, you can set up boolean Series, then perform a custom groupby.transform
:
m1 = data['value'].ne(1)
m2 = data['categ'].ne(1)
data['out'] = (m1[~m2]
.groupby(m2.cumsum())
# increment 1 for each non-1 following a 1
.transform(lambda g: (g&~g.shift(fill_value=False)).cumsum()
# add 1 if group starts with non-1
+(1-g.iloc[0]))
.reindex(data.index, fill_value=0) # 0 if categ is non-1
)
Output:
categ value out
0 0 0 0
1 1 1 1
2 1 1 1
3 1 0 2
4 1 0 2
5 1 1 2
6 1 0 3
7 1 1 3
8 1 0 4
9 1 0 4
10 1 0 4
11 1 1 4
12 1 1 4
13 0 0 0
14 0 0 0
15 1 0 1
16 1 0 1
17 1 1 1
18 1 1 1
19 1 0 2
20 1 0 2
21 1 1 2
Intermediates:
categ value m1 m2 m2.cumsum() g&~g_shift out
0 0 0 True True 1 NaN 0
1 1 1 False False 1 False 1
2 1 1 False False 1 False 1
3 1 0 True False 1 True 2
4 1 0 True False 1 False 2
5 1 1 False False 1 False 2
6 1 0 True False 1 True 3
7 1 1 False False 1 False 3
8 1 0 True False 1 True 4
9 1 0 True False 1 False 4
10 1 0 True False 1 False 4
11 1 1 False False 1 False 4
12 1 1 False False 1 False 4
13 0 0 True True 2 NaN 0
14 0 0 True True 3 NaN 0
15 1 0 True False 3 True 1
16 1 0 True False 3 False 1
17 1 1 False False 3 False 1
18 1 1 False False 3 False 1
19 1 0 True False 3 True 2
20 1 0 True False 3 False 2
21 1 1 False False 3 False 2
updated question
For your updated question, it looks like you could just change the m1
condition and remove the +(1-g.iloc[0]))
correction factor:
m1 = data['value'].eq(1)
m2 = data['categ'].ne(1)
data['out'] = (m1[~m2]
.groupby(m2.cumsum())
# increment 1 for each non-1 following a 1
.transform(lambda g: (g&~g.shift(fill_value=False)).cumsum())
.reindex(data.index, fill_value=0) # 0 if categ is non-1
)
Output:
categ value desiredResults out
0 0 0 0 0
1 1 1 1 1
2 1 1 1 1
3 1 0 1 1
4 1 0 1 1
5 1 1 2 2
6 1 0 2 2
7 1 1 3 3
8 1 0 3 3
9 1 0 3 3
10 1 0 3 3
11 1 1 4 4
12 1 1 4 4
13 0 0 0 0
14 0 0 0 0
15 1 0 0 0
16 1 0 0 0
17 1 1 1 1
18 1 1 1 1
19 1 0 2 1 # this is however different!
20 1 0 2 1
21 1 1 2 2
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.