Issue
I want to introduce the contents of one df to another but as a list based on ID. I know to merge based on ID but I do not want duplicate rows for ID in the new dataframe. How do I get this done?
data1 = {'ID': ['AB01','AB02'],
'Name': ["toyota", "honda"],
'Age':[21,22]
}
df1 = pd.DataFrame.from_dict(data1)
data2 = {'ID': ['AB01','AB01','AB03','AB03'],
'Type': ["C",np.nan,"X","S"],
'Score':[87,98,45,82]
}
df2 = pd.DataFrame.from_dict(data2)
The result should look like this
Solution
You can make dict
on the rows of df2
by .apply()
, then group by ID
and aggregate the dict of same ID
into list by .groupby()
+ .agg()
.
Then, merge with df1
with .merge()
by left join with ID
as matching keys, as follows:
df2_info = (df2.apply(dict, axis=1)
.groupby(df2['ID'])
.agg(list)
.reset_index(name='Info')
)
df_out = df1.merge(df2_info, on='ID', how='left')
Result
print(df_out)
ID Name Age Info
0 AB01 toyota 21 [{'ID': 'AB01', 'Type': 'C', 'Score': 87}, {'ID': 'AB01', 'Type': nan, 'Score': 98}]
1 AB02 honda 22 NaN
For reference only, interim result of df2_info
:
ID Info
0 AB01 [{'ID': 'AB01', 'Type': 'C', 'Score': 87}, {'ID': 'AB01', 'Type': nan, 'Score': 98}]
1 AB03 [{'ID': 'AB03', 'Type': 'X', 'Score': 45}, {'ID': 'AB03', 'Type': 'S', 'Score': 82}]
Answered By - SeaBean
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.