Issue
I have 2 dataframes (pandas) where some columns have identical names, index is unique dates on both. I want to merge it into one dataframe where the columns that has identical names will be summed in the new merged dataframe and columns that are not identical will be kept as is.
Didn't try yet, looking for a general solution
Solution
I hope I understood you right. Here with a sample data:
import pandas as pd
df1 = pd.DataFrame({'Time1': ['2023-11-30',
'2023-11-28',
'2023-11-27',
'2023-11-26',
'2023-11-25'],
'Data1': [2, 4, 6, 8, 10]})
df2 = pd.DataFrame({'Time2': ['2023-11-30',
'2023-11-29',
'2023-11-28',
'2023-11-27',
'2023-11-20'],
'Data1': [1, 2, 3, 4, 5],
'Data2': [62, 7, 5, 3, 10]})
df1 = df1.set_index('Time1')
df2 = df2.set_index('Time2')
df3 = pd.merge(left = df1, right = df2, left_index=True, right_index=True, how='outer')
df3.fillna(0, inplace=True)
# Sum and deletes duplicate columns
cols2drop =[] # this list will contain all columns to delete
for col in df3.columns: # iterate through column names
if (col.endswith('_x')):
col_stem = col[:len(col)-2] # create column stem
df3[col_stem] = df3[col_stem + '_x'] + df3[col_stem + '_y'] # create new column with sum
cols2drop.append(col_stem + '_x') # append column names to list
cols2drop.append(col_stem + '_y')
df3.drop(columns=cols2drop, inplace=True) # delete now unneeded columns
df3
Data2 Data1
2023-11-20 10.0 5.0
2023-11-25 0.0 10.0
2023-11-26 0.0 8.0
2023-11-27 3.0 10.0
2023-11-28 5.0 7.0
2023-11-29 7.0 2.0
2023-11-30 62.0 3.0
Answered By - gtomer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.