Issue
I have two pandas df x and y, both with the same 3 columns A B C (not nullable). I need to create a new df z, obtained by "subtracting from x the rows which are entirely identical to the rows of y", i.e. a
x left join y on x.A=y.A and x.B=y.B and x.C=y.C
where y.A is null
How would I do that? Got stuck with indexes, concat, merge, join, ...
Example:
dataframe x
A B C
q1 q2 q3
q4 q2 q3
q7 q2 q9
dataframe y
A B C
q4 q2 q3
dataframe z
A B C
q1 q2 q3
q7 q2 q9
Solution
I think need merge
with indicator and filter only rows from left
DataFrame
:
df = x.merge(y, indicator='i', how='outer').query('i == "left_only"').drop('i', axis=1)
print (df)
A B C
0 q1 q2 q3
2 q7 q2 q93
In earlier versions of pandas, it may be necessary to replace .drop('i', axis=1)
with .drop('i', axis=1)
. The former is necessary to avoid warnings in later versions of Pandas.
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.