Issue
I have got 7 dataframes with same column names in python, but row indices are largely different and only few of the row indices match.
df1
col1 col2 col3
a 123 456 786
b 121 454 787
c 111 444 777
df2
col1 col2 col3
d 13 46 86
e 11 44 87
c 11 44 77
df3
col1 col2 col3
d 1231 4567 1786
g 1214 4546 1787
h 1115 4445 1777
c 12 12 10
Answer should be:
newdf
newcol col1 col2 col3 dfcol
c 111 444 777 1
c 11 44 77 2
c 12 12 10 3
d 13 46 86 2
d 1231 4567 1786 3
It is similar to this How to get the common index of two pandas dataframes? but not exactly.
Solution
IIUC
This is my beginner way of doing it.
import pandas as pd
Create dataframe
df1 = pd.DataFrame({
'id': ['a', 'b', 'c'],
'col1': [123, 121, 111],
'col2': [456, 454, 444],
'col3': [786, 787, 777],
})
df2 = pd.DataFrame({
'id': ['d', 'e', 'c'],
'col1': [13, 46, 86],
'col2': [11, 44, 87],
'col3': [11, 44, 77],
})
df3 = pd.DataFrame({
'id': ['d', 'g', 'h', 'c'],
'col1': [1231, 1214, 1115, 12],
'col2': [4567, 4546, 4445, 12],
'col3': [1786, 1787, 1777, 10],
})
Create dataframe list as well as giving name to each dataframe
df_collection = [df1, df2, df3]
df1.name = 1
df2.name = 2
df3.name = 3
Create list:
id_col = [[df.loc[i, 'id'], df.loc[i, 'col1'], df.loc[i, 'col2'], df.loc[i, 'col3'], df.name] for df in df_collection for i in range(len(df))]
Construct new dataframe:
df = pd.DataFrame(id_col)
df.columns = ['id', 'col1', 'col2', 'col3', 'df']
Remove rows with only 1 occurence:
new_df = df[df.groupby('id').id.transform('count') > 1].sort_values('id')
This will give us the following:
id col1 col2 col3 df
2 c 111 444 777 1
5 c 86 87 77 2
9 c 12 12 10 3
3 d 13 11 11 2
6 d 1231 4567 1786 3
Answered By - garagnoth
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.