Issue
df1 = pd.DataFrame({'a':['id1','id2','id3'],'b':['W','W','W'],'c1':[1,2,3]})
df2 = pd.DataFrame({'a':['id1','id2','id3'],'b':['W','W','W'],'c2':[4,5,6]})
df3 = pd.DataFrame({'a':['id1','id4','id5'],'b':['Q','Q','Q'],'c1':[7,8,9]})
I'm trying to concatenate df1
df2
and df3
into one dataframe:
a b c1 c2
id1 W 1 4
id2 W 2 5
id3 W 3 6
id1 Q 7 NA
id4 Q 8 NA
id5 Q 9 NA
I tried:
l = [d.set_index(['a','b']) for d in [df1,df2,df3]]
pd.concat(l, axis=1)
but the output isn't what I expected:
c1 c2 c1
a b
id1 W 1.0 4.0 NaN
id2 W 2.0 5.0 NaN
id3 W 3.0 6.0 NaN
id1 Q NaN NaN 7.0
id4 Q NaN NaN 8.0
id5 Q NaN NaN 9.0
Solution
You can join MultiIndex Series
created by DataFrame.stack
:
l = [d.set_index(['a','b']).stack() for d in [df1,df2,df3]]
df = pd.concat(l).unstack().sort_index(level=[1,0], ascending=[False, True])
print (df)
c1 c2
a b
id1 W 1.0 4.0
id2 W 2.0 5.0
id3 W 3.0 6.0
id1 Q 7.0 NaN
id4 Q 8.0 NaN
id5 Q 9.0 NaN
If there is only 3 columns DataFrames use DataFrame.squeeze
or select first column by iloc[:, 0]
for list of Series:
l = [d.set_index(['a','b']).squeeze() for d in [df1,df2,df3]]
keys = [x.name for x in l]
df = (pd.concat(l, axis=0, keys=keys)
.unstack(0)
.sort_index(level=[1,0], ascending=[False, True]))
print (df)
c1 c2
a b
id1 W 1.0 4.0
id2 W 2.0 5.0
id3 W 3.0 6.0
id1 Q 7.0 NaN
id4 Q 8.0 NaN
id5 Q 9.0 NaN
l = [d.set_index(['a','b']).iloc[:, 0] for d in [df1,df2,df3]]
keys = [x.name for x in l]
df = (pd.concat(l, axis=0, keys=keys)
.unstack(0)
.sort_index(level=[1,0], ascending=[False, True]))
Another idea is chain multiple DataFrames in list by DataFrame.combine_first
:
from functools import reduce
dfs = [d.set_index(['a','b']) for d in [df1,df2,df3]]
df = (reduce(lambda x, y: x.combine_first(y), dfs)
.sort_index(level=[1,0], ascending=[False, True]))
print (df)
c1 c2
a b
id1 W 1.0 4.0
id2 W 2.0 5.0
id3 W 3.0 6.0
id1 Q 7.0 NaN
id4 Q 8.0 NaN
id5 Q 9.0 NaN
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.