Issue
I have a pandas dataframe of the form
advert_id run_id category score
11111111 78 842 0.356975
11111111 78 849 0.245583
11111111 78 950 0.089219
11111111 78 1645 0.089172
11111111 78 2494 0.044254
... ... ... ...
22222222 1083267 2521 0.078275
22222222 1083267 2553 0.121556
22222222 1083267 2872 0.039226
22222222 1083267 3045 0.362127
22222222 1083267 3049 0.040135
And would like to transform it to a dataframe of the form (one row now per advert_id):
advert_id run_id category_1 score_1 category_2 score_2 category_3 score_3 ... category_n score_n
11111111 78 842 0.356975 849 0.245583 950 0.089219 ...
22222222 1083267 2521 0.078275 2553 0.121556 2872 0.039226 ...
The number of category per advert can vary, some adverts may have 1..n categories.
Is there an elegant way to do this with python/pandas other than grouping the dataframe and "manually" iterating over the groups and populating a separate dataframe?
Solution
After create the additional key with cumcount
df['key2']=(df.groupby('advert_id').cumcount()+1)
s=df.set_index(['advert_id','run_id','key2']).unstack().sort_index(level=1,axis=1)
s.columns=s.columns.map('{0[0]}_{0[1]}'.format)
s
Out[59]:
category_1 score_1 ... category_5 score_5
advert_id run_id ...
11111111 78 842 0.356975 ... 2494 0.044254
22222222 1083267 2521 0.078275 ... 3049 0.040135
[2 rows x 10 columns]
Answered By - BENY
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.