Issue
I was using pandas cut for the binning continuous values. I wonder how to get the mean for each bin.
MWE
import numpy as np
import pandas as pd
np.random.seed(100)
df = pd.DataFrame({'a': np.random.randint(1,10,10)})
df['bins_a'] = pd.cut(df['a'],4)
print(df)
a bins_a
0 9 (7.0, 9.0]
1 9 (7.0, 9.0]
2 4 (3.0, 5.0]
3 8 (7.0, 9.0]
4 8 (7.0, 9.0]
5 1 (0.992, 3.0]
6 5 (3.0, 5.0]
7 3 (0.992, 3.0]
8 6 (5.0, 7.0]
9 3 (0.992, 3.0]
I tried:
df['bins_a_mean'] = df['bins_a'].mean()
But this fails.
How to get the means for each interval?
Solution
Try this:
df['bins_a_mean'] = df.groupby('bins_a')['a'].transform('mean')
print(df)
a bins_a bins_a_mean
0 9 (7.0, 9.0] 8.500000
1 9 (7.0, 9.0] 8.500000
2 4 (3.0, 5.0] 4.500000
3 8 (7.0, 9.0] 8.500000
4 8 (7.0, 9.0] 8.500000
5 1 (0.992, 3.0] 2.333333
6 5 (3.0, 5.0] 4.500000
7 3 (0.992, 3.0] 2.333333
8 6 (5.0, 7.0] 6.000000
9 3 (0.992, 3.0] 2.333333
Answered By - BhishanPoudel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.