Issue
Suppose I have 2 dataframes,
df1 = pd.DataFrame({'key': ['a', 'A',], 'value': [1, 1,]})
df2 = pd.DataFrame({'key': ['A', 'B'], 'value': [5, 6,]})
The dataframes have the same columns, but different rows.
df1:
key value
a 1
A 1
df2:
key value
A 5
B 6
I want the resulting dataframe to be:
df3:
key value
a 1
A 5
B 6
It is kinda like dict.update()
. If the key is already in there, update a new value; otherwise, add a new key and a new value. But how do I do it with dataframe?
And I don't want to do anything with the suffix'_x''_y', because in reality I have about 2 key columns and 10+ value columns.
Forgive me if this problem is too simple.
Solution
You can achieve this by using the combine_first()
method of pandas
.
Update null elements with value in the same location in other.
Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two. The resulting dataframe contains the ‘first’ dataframe values and overrides the second one values where both first.loc[index, col] and second.loc[index, col] are not missing values, upon calling first.combine_first(second).
Read documentation of combine_first() here.
Here's the working code:
import pandas as pd
df1 = pd.DataFrame({'key': ['a', 'A'], 'value': [1, 1]})
df2 = pd.DataFrame({'key': ['A', 'B'], 'value': [5, 6]})
# Set the 'key' column as the index for both dataframes
df1.set_index('key', inplace=True)
df2.set_index('key', inplace=True)
# merge both dataframes
df3 = df2.combine_first(df1).reset_index()
print(df3)
Here's the output:
key value
0 A 5
1 B 6
2 a 1
Answered By - iihsan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.