Issue
I have a dataframe like so:
ID 1 ID 2
1 5
1 5
1 6
2 7
2 5
2 7
3 8
3 9
3 10
What I am trying to do is only keep the rows that, when the df is grouped by ID 1, ID 2 exists more than once. I have used .groupby and .value_counts in this method:
df_temp = df.groupby('ID 1')
df_output = df_temp['ID 2'].value_counts()[df_temp1['ID 2'].value_counts() > 1]
This returns something like so:
ID 1 ID 2
1 5 2
2 7 2
Is there a way I can use this in order to only keep the rows in the initial df with the ID 2 in this groupedby object? To get a result like this:
ID 1 ID 2
1 5
1 5
2 7
2 7
Solution
Don't groupby
, perform boolean indexing with duplicated
:
out = df[df.duplicated(['id1', 'id2'], keep=False)]
Output:
id1 id2
0 1 5
1 1 5
3 2 7
5 2 7
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.