Sunday, May 15, 2022

[FIXED] Pandas : Assign result of groupby to dataframe to a new column

May 15, 2022 dataframe, group-by, pandas, python No comments

Issue

I have the following toy dataframe (the real one has 500k rows):

df = pd.DataFrame({'size': list('SSMMMLS'),
                   'weight': [8, 10, 11, 1, 20, 14, 12],
                   'adult' : [False] * 5 + [True] * 2})

   adult size  weight
0  False    S       8
1  False    S      10
2  False    M      11
3  False    M       1
4  False    M      20
5   True    L      14
6   True    S      12

And want to groupby adult, select the row for which weight is maximal and assign in a new column size2 the size column value.

In other words we want a column size2 with the size value of the line with the max weight propagated to the adult groupby. So all adult = False lines will have value S because adult=False max weight is 20.

   adult size size2  weight
0  False    S     S       8
1  False    S     S      10
2  False    M     S      11
3  False    M     S       1
4  False    M     S      20
5   True    L     L      14
6   True    S     L      12

I found this but it doesn't work for me

So far I have :

df.loc[:, 'size2'] = (df.groupby('adult',as_index=True)['weight','size']
                        .transform(lambda x: x.ix[x['weight'].idxmax()]['size']))

Solution

IIUC you can use merge. I think first value in size2 is M, because max weight is 20.

df = pd.DataFrame({'size': list('SSMMMLS'),
                   'weight': [8, 10, 11, 1, 20, 14, 12],
                   'adult' : [False] * 5 + [True] * 2})

print(df)
   adult size  weight
0  False    S       8
1  False    S      10
2  False    M      11
3  False    M       1
4  False    M      20
5   True    L      14
6   True    S      12

print(
    df.groupby('adult') 
       .apply(lambda subf: subf['size'][subf['weight'].idxmax()]).reset_index(name='size2')
    )               
   adult size2
0  False     M
1   True     L

print(
    pd.merge(df, 
             df.groupby('adult')
               .apply(lambda subf: subf['size'][subf['weight'].idxmax()]
                     ).reset_index(name='size2'), on=['adult'])
      )          
   adult size  weight size2
0  False    S       8     M
1  False    S      10     M
2  False    M      11     M
3  False    M       1     M
4  False    M      20     M
5   True    L      14     L
6   True    S      12     L

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, May 15, 2022

[FIXED] Pandas : Assign result of groupby to dataframe to a new column

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels