Issue
I am having two dataframes:
import pandas as pd
df1=pd.DataFrame({'ID':['A', 'B', 'C', 'D', 'E'], 'Date':['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'], 'Value1':[1, 2, 3, 4, 5], 'Value2':[5, 6, 7, 8, 9]})
df2=pd.DataFrame({'ID':['A', 'B', 'C', 'D'], 'Date':['2024-01-30', '2024-01-30', '2024-01-30', '2024-01-30'], 'Value1':[1, 2, 7, 4], 'Value2':[5, 6, 7, 9]})
Here is how df1
looks like :
ID Date Value1 Value2
0 A 2024-01-01 1 5
1 B 2024-01-02 2 6
2 C 2024-01-03 3 7
3 D 2024-01-04 4 8
4 E 2024-01-05 5 9
Here is how df2
looks like :
ID Date Value1 Value2
0 A 2024-01-30 1 5
1 B 2024-01-30 2 6
2 C 2024-01-30 7 7
3 D 2024-01-30 4 9
Now my purpose is to find out which two rows between these two dataframes having different values on columns ['Value1', 'Value2']
based on 'ID'
column. In this case, the rows with 'ID'
s as 'C'
and 'D'
. Note, the 'Date'
column is different, but I do not need to compare them.
Any thoughts about a neat solution without marshal through the whole dataframes?
Solution
Anti Join
I accomplished this with an anti join
to determine the IDs
that do not match between the two.
outer = df1.merge(df2,how='outer',on=['ID','Value1','Value2'],indicator=True)
anti_join = outer[(outer._merge=='left_only')].drop('_merge',axis=1)
anti_join = anti_join[df1['ID'].isin(df2['ID'])]
Answered By - stefan_aus_hannover
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.