Issue
I am having a hard time to apply a custom function to each set of groupby
column in Pandas
My custom function takes series of numbers and takes the difference of consecutive pairs and returns the mean of all the differences. Below is the code:
def mean_gap(a):
b = []
for i in range(0, len(a)-1):
b.append((a[i+1]-a[i]))
return np.mean(b)
So if a = [1,3,7]
, mean_gap(a)
will give me ((3-1)+(7-3))/2) = 3.0
Dataframe:
one two
a 1
a 3
a 7
b 8
b 9
Desired result:
Dataframe:
one two
a 3
b 1
df.groupby(['one'])['two'].???
I am new to pandas. I read that groupby
takes values each row at a time, not full series. So I am not able to use lambda after groupby
.
Solution
With a custom function, you can do:
df.groupby('one')['two'].agg(lambda x: x.diff().mean())
one
a 3
b 1
Name: two, dtype: int64
and reset the index:
df.groupby('one')['two'].agg(lambda x: x.diff().mean()).reset_index(name='two')
one two
0 a 3
1 b 1
An alternative would be:
df.groupby('one')['two'].diff().groupby(df['one']).mean()
one
a 3.0
b 1.0
Name: two, dtype: float64
Your approach would have also worked with the following:
def mean_gap(a):
b = []
a = np.asarray(a)
for i in range(0, len(a)-1):
b.append((a[i+1]-a[i]))
return np.mean(b)
df.groupby('one')['two'].agg(mean_gap)
one
a 3
b 1
Name: two, dtype: int64
a = np.asarray(a)
is necessary because otherwise you would get KeyErrors in b.append((a[i+1]-a[i]))
.
Answered By - ayhan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.