Issue
I have a data frame like below:
date|point|agent
2023-10-02|A|agent1
2023-10-02|A|agent2
2023-10-05|B|agent3
2023-10-05|B|agent2
2023-10-02|C|agent1
2023-10-02|C|agent2
2023-10-02|C|agent3
On each day at a specific point, there should be only two agents. There are cases that there are more than two, I want to return those I want to return the rows that have more than 2 agents.
I used groupby to first count:
df.groupby(['point','date'])['agent'].nunique()>2
I can use
df['agent_count'] = df.groupby(['point','date'])['agent'].transform('nunique')
and then get the the rows that have more than 2. But is there another way, without having redundant data?
I used loc
and iloc
, where
and exh gives me lots of error. I am looking for an efficient way to return the rows without adding the counts to the dataframe. I did explored questions here for two hours but none of them were working.
Solution
You're on the right track. Here's how you can do it in one line.
df[df.groupby(['point','date'])['agent'].transform('nunique')>2]
date point agent
4 2023-10-02 C agent1
5 2023-10-02 C agent2
6 2023-10-02 C agent3
Answered By - amance
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.