Issue
I would like to find out the most utilized location for the date of 2/1/2022.
Data
ID location total marks_free marks_utilized date
1 NY 6 5 1 2/1/2022
2 NY 10 5 5 2/1/2022
3 NY 2 1 1 2/1/2022
4 CA 5 4 1 2/1/2022
5 CA 6 5 1 2/1/2022
6 CA 10 10 0 2/1/2022
7 NY 6 6 0 3/1/2022
8 NY 10 10 0 3/1/2022
9 NY 2 1 1 3/1/2022
10 CA 5 4 1 3/1/2022
11 CA 6 5 1 3/1/2022
12 CA 10 10 0 3/1/2022
Desired
location marks_utilized date
NY 38% 2/1/2022
Logic
filter to 2/1/2022, groupby location
for instance lets take NY
sum(marks_utilized) / sum(total) * 100
7/18 *100 = 38%
Doing
# filter to 2/1/2022
df1 = df.groupby(['location', 'date']).agg({'marks_utilized': 'sum', 'total': 'sum'})
df1['marks_utilized'] = df['marks_utilized'] / df['total'] * 100
Still researching this. Any suggestion is appreciated.
Solution
just need a simple modification on your attempt, it would work.
df1['marks_utilized'] = df['marks_utilized'] / df['total'] * 100
should be df1['marks_utilized'] = df1['marks_utilized'] / df1['total'] * 100
If you only want result in 2/1/2022
, you could filter the df
and do groupby
afterwards. Also, could use df1.to_string(formatters={'marks_utilized': '{:,.2f}'.format}
to format the float
to percentage string.
ID,location,total,marks_free,marks_utilized,date
1,NY,6,5,1,2/1/2022
2,NY,10,5,5,2/1/2022
3,NY,2,1,1,2/1/2022
4,CA,5,4,1,3/1/2022
5,CA,6,5,1,3/1/2022
6,CA,10,10,0,3/1/2022
import pandas as pd
df = pd.read_csv("test.csv")
df1 = df.groupby(['location', 'date']).agg({'marks_utilized': 'sum', 'total': 'sum'})
df1['marks_utilized'] = df1['marks_utilized'] / df1['total']
max_row = df1.loc[df1['marks_utilized'].idxmax()]
print(max_row)
marks_utilized 0.388889
total 18.000000
Name: (NY, 2/1/2022), dtype: float64
Answered By - hide1nbush
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.