Issue
I have the following df:
df3 = pd.DataFrame(np.array([['Iza', 'Tuesday'],['Martin', 'Friday'],['John', 'Monday'],['Iza', 'Tuesday'],['Iza', 'Tuesday'],['Iza', 'Wednesday'],['Sara', 'Friday'], ['Sara', 'Friday'], ['Sara', 'Sunday'],['Silvia', 'Monday'],['Silvia', 'Wednesday'],['Paul', 'Monday'],['Paul', 'Tuesday'],['Paul', 'Wednesday']]),
columns=['Name', 'Day'])
df3:
Name Day
0 Iza Tuesday
1 Martin Friday
2 John Monday
3 Iza Tuesday
4 Iza Tuesday
5 Iza Wednesday
6 Sara Friday
7 Sara Friday
8 Sara Sunday
9 Silvia Monday
10 Silvia Wednesday
11 Paul Monday
12 Paul Tuesday
13 Paul Wednesday
I got the count of days for each user:
oo = df3.groupby(['Name','Day'])['Day'].size().reset_index(name='counts')
result:
Name Day counts
0 Iza Tuesday 3
1 Iza Wednesday 1
2 John Monday 1
3 Martin Friday 1
4 Paul Monday 1
5 Paul Tuesday 1
6 Paul Wednesday 1
7 Sara Friday 2
8 Sara Sunday 1
9 Silvia Monday 1
10 Silvia Wednesday 1
dropped unwanted users with only one day record;
uniq_us = oo[oo.duplicated(['Name'], keep=False)]
result:
Name Day counts
0 Iza Tuesday 3
1 Iza Wednesday 1
4 Paul Monday 1
5 Paul Tuesday 1
6 Paul Wednesday 1
7 Sara Friday 2
8 Sara Sunday 1
9 Silvia Monday 1
10 Silvia Wednesday 1
Now I want to get the percentage of counts in each grouped days by name:
uniq_us.groupby(['Name','Day'])['counts'].apply(lambda x: x.value_counts(normalize=True)) * 100
I got:
Name Day
Iza Tuesday 3 100.0
Wednesday 1 100.0
Paul Monday 1 100.0
Tuesday 1 100.0
Wednesday 1 100.0
Sara Friday 2 100.0
Sunday 1 100.0
Silvia Monday 1 100.0
Wednesday 1 100.0
Name: counts, dtype: float64
I do not know how can I calculate it per grouped Name
Desired output:
Name Day
Iza Tuesday 3 75.0
Wednesday 1 25.0
Paul Monday 1 33.33
Tuesday 1 33.33
Wednesday 1 33.33
Sara Friday 2 66.66
Sunday 1 33.34
Silvia Monday 1 50.0
Wednesday 1 50.0
Name: counts, dtype: float64
Solution
You can normalize the counts with their sum via transform
:
uniq_us["pcnt"] = uniq_us.groupby("Name").counts.transform(lambda x: x / x.sum())
to get
>>> uniq_us
Name Day counts pcnt
0 Iza Tuesday 3 0.750000
1 Iza Wednesday 1 0.250000
4 Paul Monday 1 0.333333
5 Paul Tuesday 1 0.333333
6 Paul Wednesday 1 0.333333
7 Sara Friday 2 0.666667
8 Sara Sunday 1 0.333333
9 Silvia Monday 1 0.500000
10 Silvia Wednesday 1 0.500000
You can put 100 *
and round(2)
in lambda
and set the Name
and Day
as the index to match the output:
...transform(lambda x: (100 * x / x.sum()).round(2))
uniq_us = uniq_us.set_index(["Name", "Day"])
to get
counts pcnt
Name Day
Iza Tuesday 3 75.00
Wednesday 1 25.00
Paul Monday 1 33.33
Tuesday 1 33.33
Wednesday 1 33.33
Sara Friday 2 66.67
Sunday 1 33.33
Silvia Monday 1 50.00
Wednesday 1 50.00
Answered By - Mustafa Aydın
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.