Issue
Consider:
df1 = pd.DataFrame({'a':(1,2,3,4),'b':(10,20,30,40),'c':(100,200,300,400)})
df2 = pd.DataFrame({'a':(1,2,3),'b':(10,20,30),'c':(1111,2222,3333)})
df1: df2:
a b c a b c
0 1 10 100 0 1 10 1111
1 2 20 200 1 2 20 2222
2 3 30 300 2 3 30 3333
3 4 40 400
Do the following operation:
df1.set_index(['a', 'b']).loc[df2.set_index(['a', 'b']).index, 'c'] = df2.c
My expectation of df1 would be:
a b c
0 1 10 1111
1 2 20 2222
2 3 30 3333
3 4 40 400
The result is:
a b c
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
Could you please help me to achieve my expected goal and explain my failure?
Solution
df1.set_index(['a', 'b']).loc[df2.set_index(['a', 'b']).index, 'c']
creates a new DataFrame, to which you assign values and then drop it since it's not assigned to a variable name.
If you have a range index, you can merge
and combine_first
:
out = df1[['a', 'b']].merge(df2, on=['a', 'b'], how='left').combine_first(df1)
For an arbitrary index and to assign in place:
df1['d'] = (df1[['a', 'b']].reset_index()
.merge(df2, on=['a', 'b'], how='left')
.set_index('index')['c']
.fillna(df1['c'])
)
NB. this is assuming that there is no duplicated combination of a
/b
in df2
.
Output:
a b c
0 1 10 1111.0
1 2 20 2222.0
2 3 30 3333.0
3 4 40 400.0
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.