Issue
Hi I am trying to merge two dataset in the following way:
df1=pd.DataFrame({'company name':['A','B','C'],
'analyst 1 name':['Tom','Mike',np.nan],
'analyst 2 name':[np.nan,'Alice',np.nan],
'analyst 3 name':['Jane','Steve','Alex']})
df2=pd.DataFrame({'company name':['A','B','C'],
'score 1':[3,5,np.nan],
'score 2':[np.nan,1,np.nan],
'score 3':[6,np.nan,11]})
df_desire=pd.DataFrame({'company name':['A','A','B','B','B','C'],
'analyst':['Tom','Jane','Mike','Alice','Steve','Alex'],
'score':[3,6,5,1,np.nan,11]})
Basically, df1 contains analyst name and df2 contains scores that analysts assign. I am trying to merge two into df_desire. The way to read two tables is: for company A, two person cover it, which are Tom and Jane, who assign 3 and 6 respectively. Noted that even though Steve covers company B but I deliberately assign the score to be NA for robustness purpose.
What I have done is :
pd.concat([df1.melt(id_vars='company name',value_vars=['analyst 1 name','analyst 2 name','analyst 3 name']),\
df2.melt(id_vars='company name',value_vars=['score 1','score 2','score 3'])],axis=1)
I am looking for a more elegant solution.
Solution
Try:
x = (
df1.set_index("company name")
.stack(dropna=False)
.reset_index(name="name")
.drop(columns="company name")
)
y = df2.set_index("company name").stack(dropna=False).reset_index(name="score")
print(
pd.concat([x, y], axis=1)[["company name", "name", "score"]]
.dropna(subset=["name", "score"], how="all")
.reset_index(drop=True)
)
Prints:
company name name score
0 A Tom 3.0
1 A Jane 6.0
2 B Mike 5.0
3 B Alice 1.0
4 B Steve NaN
5 C Alex 11.0
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.