Issue
I'm currently trying to pivot my pandas DataFrame by 'id' on 'rank'
print(df)
id rank year
0 key0 1 2011
1 key0 2 2012
2 key0 3 2013
3 key1 1 2014
4 key1 2 2015
5 key1 3 2016
6 key2 1 2017
7 key2 2 2018
8 key2 3 2019
Depending on the max('rank'), I want to create as many 'years' columns and give them values according to the ascending rank
print(df)
id rank1 year1 rank2 year2 rank3 year3
0 key0 1 2011 2 2012 3 2013
1 key1 1 2014 2 2015 3 2016
2 key2 1 2017 2 2018 3 2019
I tried my own solution (currently working, but I have ~2M rows and is not very effective)
df2= df.melt(id_vars=["id", "rank"], value_vars=[elem for elem in df.columns if elem not ['id','rank']])
df2['col_name'] =df2['variable']+ (df2['rang']-1).astype('str')
df2.value.fillna(0, inplace = True)
df2= pd.pivot_table(df2, index=["id"], columns=["col_name"], values="value", aggfunc=max)
I know that it is not the optimal solution and is memory consuming, here is why I'm asking for a better solution
Thanks in advance
Solution
Use DataFrame.sort_values
with DataFrame.pivot
, sorting MultiIndex
by DataFrame.sort_index
and then flatten it by f-string
s:
df1 = (df.sort_values(['id','rank'])
.pivot(index="id",columns="rank", values=["year","rank"])
.sort_index(axis=1, level=1))
df1.columns = [f'{a}{b}' for a, b in df1.columns]
df1 = df1.reset_index()
print (df1)
id rank1 year1 rank2 year2 rank3 year3
0 key0 1 2011 2 2012 3 2013
1 key1 1 2014 2 2015 3 2016
2 key2 1 2017 2 2018 3 2019
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.