Friday, September 9, 2022

[FIXED] Calculate Win Rates In Python Using Groupby and Lambda Functions

September 09, 2022 countif, dataframe, pandas, python No comments

Issue

I'm trying to create a new df from race_dbs that's grouped by 'horse_id' showing the number of times 'place' = 1 as well as the total number of times that 'horse_id' occurs.

Some background on the dataset if it's helpful;

race_dbs contains horse race data. There are 12 horses in a race, for each is shown their odds, fire, place, time, and gate number.

What I'm trying to achieve from this code is the calculation of win rates for each horse.

A win is denoted by 'place' = 1
Total race count will be calculated by how many times a particular 'horse_id' occurs in the db.

race_dbs

race_id	horse_id	odds	fire	place	horse_time	gate
V14qANzi	398807	NaN	0	1	72.0191	7
xeieZak	191424	NaN	0	8	131.3010	10
xeieZak	139335	NaN	0	1	131.3713	9
xeieZak	137195	NaN	0	11	131.6310	11
xeieZak	398807	NaN	0	12	131.7886	2
...	...	..	..	...	...	..

From this simple table the output would look like, but please bear in mind my dataset is very large, containing 12882353 rows in total.

desired output

horse_id	wins	races	win rate
398807	1	2	50%
191424	0	1	0%
139335	1	1	100%
137195	0	1	0%
...	..	..	...

It should be noted that I'm a complete coding beginner so forgive me if this is an easy solve.

I have tried to use the groupby and lambda pandas functions but I am struggling to combine both functions, and believe there will be a much simpler way.

import pandas as pd

race_db = pd.read_csv('horse_race_data_db.csv')
race_db_2 = pd.read_csv('2_horse_race_data.csv')

frames = [race_db, race_db_2]
race_dbs = pd.concat(frames, ignore_index=True, sort=False)

race_dbs_horse_wins = race_dbs.groupby('horse_id')['place'].apply(lambda x: x[x == 1].count())
race_dbs_horse_sums = race_dbs.groupby('horse_id').aggregate({"horse_id":['sum']})

Thanks for the help!

Solution

For count Trues values create helper boolean column and aggregate sum, for win rate aggregate mean and for count use GroupBy.size in named aggregations by GroupBy.agg:

out = (race_dbs.assign(no1 = race_dbs['place'].eq(1))
               .groupby('horse_id', sort=False, as_index=False)
               .agg(**{'wins':('no1','sum'),
                    'races':('horse_id','size'),
                    'win rate':('no1','mean')}))
print (out)
   horse_id  wins  races  win rate
0    398807     1      2       0.5
1    191424     0      1       0.0
2    139335     1      1       1.0
3    137195     0      1       0.0

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, September 9, 2022

[FIXED] Calculate Win Rates In Python Using Groupby and Lambda Functions

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels