Issue
Let's say I have a dataframe:
df = pd.DataFrame({
'ID': [1, 2, 3, 1, 2, 3],
'Value': ['A', 'B', 'A', 'B', 'C', 'A']
})
If I wanted to only remove duplicates on ID when ID is a specified value (let's say 1), how would I do that? In other words, the resulting dataframe would look like:
|ID|Value|
|--|-----|
|1 |A |
|2 |B |
|3 |A |
|2 |C |
|3 |A |
AI assistants are having a surprisingly difficult time with this one.
Solution
Create a boolean mask with DataFrame.duplicated
:
mask = df.ID.eq(1) & df.duplicated(subset=["ID"])
print(df[~mask])
Prints:
ID Value
0 1 A
1 2 B
2 3 A
4 2 C
5 3 A
Steps:
print(df.ID.eq(1))
0 True
1 False
2 False
3 True
4 False
5 False
Name: ID, dtype: bool
print(df.duplicated(subset=["ID"]))
0 False
1 False
2 False
3 True
4 True
5 True
dtype: bool
mask = df.ID.eq(1) & df.duplicated(subset=["ID"])
print(mask)
0 False
1 False
2 False
3 True
4 False
5 False
dtype: bool
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.