Issue
I am confused why A Pandas Groupby function can be written both of the ways below and yield the same result. The specific code is not really the question, both give the same result. I would like someone to breakdown the syntax of both.
df.groupby(['gender'])['age'].mean()
df.groupby(['gender']).mean()['age']
In the first instance, It reads as if you are calling the .mean() function on the age column specifically. The second appears like you are calling .mean() on the whole groupby object and selecting the age column after? Are there runtime considerations.
Solution
It reads as if you are calling the
.mean()
function on the age column specifically. The second appears like you are calling.mean()
on the whole groupby object and selecting the age column after?
This is exactly what's happening. df.groupby()
returns a dataframe. The .mean()
method is applied column-wise by default, so the mean of each column is calculated independent of the other columns and the results are returned as a Series
(which can be indexed) if run on the full dataframe.
Reversing the order produces a single column as a Series
and then calculates the mean. If you know you only want the mean for a single column, it will be faster to isolate that first, rather than calculate the mean for every column (especially if you have a very large dataframe).
Answered By - baileythegreen
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.