Issue
I have a dataframe with all of a fighter's fights, the fight number (i.e. if it is their first, second, etc.), and whether or not they won the fight. I would like to calculate the number of consecutive wins a fighter had gotten before their current fight (i.e. not including if they won the current fight). I am currently working with Python 3.7 in Spyder.
Suppose we have the following dataframe, where win = 1 if the fighter won the fight:
df = pd.DataFrame({'fighter' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'],
'fight_number' : ['1', '2', '3', '4', '1', '2', '3', '1', '2'],
'win' : [0, 0, 1, 1, 1, 1, 0, 1, 1]})
fighter fight_number win
0 A 1 0
1 A 2 0
2 A 3 1
3 A 4 1
4 B 1 1
5 B 2 1
6 B 3 0
7 C 1 1
8 C 2 1
I know that to calculate win streaks across all rows, I can implement the solution proposed here with:
grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(grouper).cumsum()
which produces:
fighter fight_number win streak
0 A 1 0 0
1 A 2 0 0
2 A 3 1 1
3 A 4 1 2
4 B 1 1 3
5 B 2 1 4
6 B 3 0 0
7 C 1 1 1
8 C 2 1 2
But what I need is to apply this approach to subgroups of the dataframe (i.e. to each fighter) and to not include the outcome of the current fight in the count of the streak. So, I am basically trying to have the current win streak of the fighter when they enter the fight.
The target output in this example would therefore be:
fighter fight_number win streak
0 A 1 0 0
1 A 2 0 0
2 A 3 1 0
3 A 4 1 1
4 B 1 1 0
5 B 2 1 1
6 B 3 0 2
7 C 1 1 0
8 C 2 1 1
I appreciate any advice I can get on this, as I am pretty new to Python.
Solution
One solution I came up with was inspired by an earlier answer posted (but deleted) by jezrael:
grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(['fighter', grouper]).cumsum()
df['streak'] = df.groupby('fighter')['streak'].shift(1).fillna(0)
which produces the target output:
fighter fight_number win streak
0 A 1 0 0.0
1 A 2 0 0.0
2 A 3 1 0.0
3 A 4 1 1.0
4 B 1 1 0.0
5 B 2 1 1.0
6 B 3 0 2.0
7 C 1 1 0.0
8 C 2 1 1.0
and it also seems to work on other test examples:
df2 = pd.DataFrame({'fighter' : ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'],
'fight number' : ["1", "2", "3", "4", "5", "6", "1", "2", "3", "1", "2"],
'win' : [1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1]})
grouper = (df2.win != df2.win.shift()).cumsum()
df2['streak'] = df2.groupby(['fighter', grouper]).cumsum()
df2['streak'] = df2.groupby('fighter')['streak'].shift(1).fillna(0)
fighter fight number win streak
0 A 1 1 0.0
1 A 2 1 1.0
2 A 3 0 2.0
3 A 4 1 0.0
4 A 5 0 1.0
5 A 6 1 0.0
6 B 1 1 0.0
7 B 2 1 1.0
8 B 3 0 2.0
9 C 1 1 0.0
10 C 2 1 1.0
Answered By - Joe
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.