Issue
I'm currently studying pandas and I come from an R/dplyr/tidyverse background.
Pandas has a not-so-intuitive API and how would I elegantly rewrite such operation from dplyr using pandas syntax?
library("nycflights13")
library("tidyverse")
delays <- flights %>%
group_by(dest) %>%
summarize(
count = n(),
dist = mean(distance, na.rm = TRUE),
delay = mean(arr_delay, na.rm = TRUE)
) %>%
filter(count > 20, dest != "HNL")
Solution
pd.DataFrame.agg method doesn't allow much flexibility for changing columns' names in the method itself
That's not exactly true. You could actually rename the columns inside agg
similar to in R although it is a better idea to not use count
as a column name as it is also an attribute:
delays = (
flights
.groupby('dest', as_index=False)
.agg(
count=('year', 'count'),
dist=('distance', 'mean'),
delay=('arr_delay', 'mean'))
.query('count > 20 & dest != "HNL"')
.reset_index(drop=True)
)
Answered By - Nuri Taş
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.