Issue
I have two dataframes df1 and df2 as shown below:-
df1 = pd.DataFrame({'x': [1, '3', 5,'t','m','u'],'y':[2, 4, 6, 4, 4, 8]})
df2 = pd.DataFrame({'x': [1, 3, '4','t'],'z':[2, 4, 6,7]})
I am trying to merge(left join) the two data frames as:-
df=pd.merge(df1, df2, how='left', on='x')
the output is:-
df
Out[25]:
x y z
0 1 2 2.0
1 3 4 NaN
2 5 6 NaN
3 t 4 7.0
4 m 4 NaN
5 u 8 NaN
Clearly for second row above i.e for x=3, I would like to have z=4 instead of NaN.Is there an option to define data type of the key during merge or any other workaround where I can change the dtype of the keys to string in both data frames and get the desired output.
Solution
You can use assign
to temporarily assign new dtype to the x column:
pd.merge(df1.assign(x=df1.x.astype(str)),
df2.assign(x=df2.x.astype(str)),
how='left', on='x')
Output:
x y z
0 1 2 2.0
1 3 4 4.0
2 5 6 NaN
3 t 4 7.0
4 m 4 NaN
5 u 8 NaN
You may use astype('string')
instead using StringDtype with difference noted here.
Answered By - Scott Boston
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.