Issue
I need get mean of expanding grouped by name.
I already have this code:
data = {
'id': [1, 2, 3, 4, 5, 6, 7, 8],
'name': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
'number': [1, 3, 5, 7, 9, 11, 13, 15]
}
df = pd.DataFrame(data)
df['mean_number'] = df.groupby('name')['number'].apply(
lambda s: s.expanding().mean().shift()
)
Ps: I use .shift() for the mean not to include the current line
Result in this:
id name number mean_number
0 1 A 1 NaN
1 2 B 3 NaN
2 3 A 5 1.0
3 4 B 7 3.0
4 5 A 9 3.0
5 6 B 11 5.0
6 7 A 13 5.0
7 8 B 15 7.0
Works, but I only need the last result of each groupby.
id name number mean_number
6 7 A 13 5.0
7 8 B 15 7.0
I would like to know if it is possible to get the mean of only these last lines, because in a very large dataset, it takes a long time to create the variables of all the lines and filter only the last ones.
Solution
If you only need the last two mean numbers you can just take the sum and count per group and calculate the values like this:
groups = df.groupby('name').agg(name=("name", "first"), s=("number", "sum"), c=("number", "count")).set_index("name")
groups
s c
name
A 28 4
B 36 4
Then you can use .tail()
to get the last row for each group
tail = df.groupby('name').tail(1).set_index("name")
tail
id number
name
A 7 13
B 8 15
Calculate the mean like this
(groups.s - tail.number) / (groups.c - 1)
name
A 5.0
B 7.0
Answered By - bitflip
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.