Issue
I'm having some difficulty removing rows based on certain criteria. In this case, there is a row with column ID and Code. If a Code is 26 and that is the only row for an ID, then the entire existence of that ID should be removed.
In the below case, only ID 124 was removed. Even though ID 125 has Code 26, it has a NULL value. I have many more columns in my dataset, but these are the only two rows of concern.
Input
ID Code
111 2
111 5
111 23
123 27
123 3
124 26
125 8
125 26
126 26
126 NULL
Output
ID Code
111 2
111 5
111 23
123 27
123 3
125 8
125 26
126 26
126 NULL
Solution
Here's a boolean index version:
df = df[df['ID'].duplicated(keep=False) | (df['Code'] != 26)]
Keep rows where the ID is duplicated OR where the code is not equal to 26 (single IDs with the code 26 are removed). Result:
ID Code
0 111 2.0
1 111 5.0
2 111 23.0
3 123 27.0
4 123 3.0
6 125 8.0
7 125 26.0
8 126 26.0
9 126 NaN
Answered By - Tom
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.