Issue
I have two dataframes as follows:
df1
Name Id c1 c2 c3 c4
---------------------------
asd 101 a b c d
cdf 231 e ? 1
zxs 342 f o
ygg 521 g k p
mlk 432 h m z
abc 343 c x q
xyz 254 1 d 2
fgg 165 c z d mm
mnd 766 2 d v
df2
df2_Name df2_Id df2_c2 df2_c4
----------------------------------
asd 101 h d2
ygg 521 x cd
fgg 165 o cm
I want to match the "Name" and "id" from df1 with "df2_Name" and "df2_id" of df2. Wherever a match is found, the values of "c2" and "c4" in df1 are replaced by the values in "df2_c2" and "df2_c4" from df2.
Desired output
Name Id c1 c2 c3 c4
-------------------------------
asd 101 a h c d2
cdf 231 e ? 1
zxs 342 f o
ygg 521 g x p cd
mlk 432 h m z
abc 343 c x q
xyz 254 1 d 2
fgg 165 c o d cm
mnd 766 2 d v
Tried solution 1
df1[df1.set_index(['Name', 'id']).index.isin(df2.set_index(['df2_Name','df2_id']).index)].iloc[:,[3,5]].update(df2.iloc[:,[2,3]])
Result: Original df1 is returned as it is.
Tried solution 2
df1.loc[df1.set_index(['Name', 'id']).index.isin(df2.set_index(['df2_Name','df2_id']).index), ['c2', 'c4']] = df2[['df2_c2', 'df2_c4']]
Result: NaNs introduced
Name id c1 c2 c3 c4
----------------------------
asd 101 a NaN c NaN
cdf 231 e ? 1
zxs 342 f o
ygg 521 g NaN p NaN
mlk 432 h m z
abc 343 c x q
xyz 254 1 d 2
fgg 165 c NaN d NaN
mnd 766 2 d v
Tried solution 3 (for c2 only)
merged = df1.merge(df2, left_on=["id", "Name"], right_on=["df2_id", "df2_Name"])
merged["c2"] = merged.apply(lambda x: x["c2"] if pd.isnull(x["df2_c2"]) else x["df2_c2"], axis=1)
Result:
Name id c1 c2 c3 c4 df2_Name df2_id df2_c2 df2_c4
--------------------------------------------------------------
asd 101 a h c d asd 101 h d2
ygg 521 g x p ygg 521 x cd
fgg 165 c o d mm fgg 165 o cm
This solution 3 replaces the values of selected columns, however it returns the merged dataframe and not the entire df1 with updates.
Can anyone help me with this problem?
Note:
This question is being asked after trying the solutions in the following, however there is no success:
- update-a-pandas-dataframe-with-data-from-another-dataframe
- replace-column-values-based-on-another-dataframe-python-pandas-better-way
Solution
I would use merge
to join the two dataframes. Then you got colums with your old values and a column with new values and nan values. Afterwards use apply
to join these columns:
merged = df1.merge(df2, how='outer', left_on=["id", "name"], right_on=["df2_id", "df2_name"])
merged["c2"] = merged.apply(lambda x: x["c2"] if pd.isnull(x["df2_c2"]) else x["df2_c2"], axis=1)
# Same for c4
# Drop df2_c2 and df2_c4
I currently cant test it so let me know if this works for you.
Answered By - ArsenieBoca
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.