Issue
I have the following toy dataframe (the real one has 500k rows):
df = pd.DataFrame({'size': list('SSMMMLS'),
'weight': [8, 10, 11, 1, 20, 14, 12],
'adult' : [False] * 5 + [True] * 2})
adult size weight
0 False S 8
1 False S 10
2 False M 11
3 False M 1
4 False M 20
5 True L 14
6 True S 12
And want to groupby adult
, select the row for which weight
is maximal and assign in a new column size2
the size
column value.
In other words we want a column size2 with the size value of the line with the max weight
propagated to the adult
groupby. So all adult
= False lines will have value S because adult=False max weight is 20.
adult size size2 weight
0 False S S 8
1 False S S 10
2 False M S 11
3 False M S 1
4 False M S 20
5 True L L 14
6 True S L 12
I found this but it doesn't work for me
So far I have :
df.loc[:, 'size2'] = (df.groupby('adult',as_index=True)['weight','size']
.transform(lambda x: x.ix[x['weight'].idxmax()]['size']))
Solution
IIUC you can use merge
. I think first value in size2
is M
, because max weight
is 20
.
df = pd.DataFrame({'size': list('SSMMMLS'),
'weight': [8, 10, 11, 1, 20, 14, 12],
'adult' : [False] * 5 + [True] * 2})
print(df)
adult size weight
0 False S 8
1 False S 10
2 False M 11
3 False M 1
4 False M 20
5 True L 14
6 True S 12
print(
df.groupby('adult')
.apply(lambda subf: subf['size'][subf['weight'].idxmax()]).reset_index(name='size2')
)
adult size2
0 False M
1 True L
print(
pd.merge(df,
df.groupby('adult')
.apply(lambda subf: subf['size'][subf['weight'].idxmax()]
).reset_index(name='size2'), on=['adult'])
)
adult size weight size2
0 False S 8 M
1 False S 10 M
2 False M 11 M
3 False M 1 M
4 False M 20 M
5 True L 14 L
6 True S 12 L
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.