Issue
I want to merge three data frames in Python, the code I have now provide me with some wrong outputs.
This is the first data frame
df_1
Year Month X_1 Y_1
0 2021 January $90 $100
1 2021 February NaN $120
2 2021 March $100 $130
3 2021 April $110 $140
4 2021 May Nan $150
5 2019 June $120 $160
This is the second data frame
df_2
Year Month X_2 Y_2
0 2021 January Nan $120
1 2021 February NaN $130
2 2021 March $80 $140
3 2021 April $90 $150
4 2021 May Nan $150
5 2021 June $120 $170
This is the third data frame
df_3
Year Month X_3 Y_3
0 2021 January $110 $150
1 2021 February $140 $160
2 2021 March $97 $170
3 2021 April $90 $180
4 2021 May Nan $190
5 2021 June $120 $200
The idea is to combine them into one data frame like this:
df_combined
Year Month X_1 Y_1 X_2 Y_2 X_3 Y_3
0 2019 January $90 $100 NaN $120 $110 $150
1 2019 February NaN $120 NaN $130 $140 $160
2 2019 March $100 $130 $80 $140 $97 $170
3 2019 April $110 $140 $90 $150 $90 $180
4 2019 May Nan $150 Nan $150 Nan $190
5 2019 June $120 $160 $120 $170 $120 $200
The code I have for now does not give me the correct outcome, only df_3 has to the correct numbers.
# compile the list of data frames you want to merge
import functools as ft
from functools import reduce
data_frames = [df_1, df_2, df_3]
df_merged = reduce(lambda cross, right: pd.merge(cross,right,on=['Year'],
how='outer'),data_frames)
#remove superfluous columns
df_merged.drop(['Month_x', 'Month_y'], axis=1, inplace=True)
Solution
One option of probably many is to do
from functools import reduce
import pandas as pd
idx = ["Year", "Month"]
new_df = reduce(pd.DataFrame.join, (i.set_index(idx) for i in dataframes)).reset_index()
or
reduce(lambda x, y: pd.merge(x, y, how="outer", on=["Year", "Month"]), dataframes)
Answered By - tomjn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.