Saturday, September 3, 2022

[FIXED] pandas.DataFrame.groupby.transform only considering one group

September 03, 2022 aggregate, group-by, pandas, python No comments

Issue

Given some data like this:

import pandas as pd
import seaborn as sns
from random import choices

iris = sns.load_dataset('iris')[['petal_width', 'species']]

iris['color'] = choices(['red', 'blue'], k=len(iris))

print(iris.sample(n=15).reset_index(drop=True))

    petal_width     species color
0           0.2      setosa  blue
1           2.2   virginica  blue
2           0.6      setosa   red
3           2.1   virginica  blue
4           1.5  versicolor  blue
5           1.0  versicolor   red
6           0.2      setosa   red
7           1.5  versicolor  blue
8           2.0   virginica  blue
9           1.8   virginica  blue
10          0.2      setosa   red
11          1.2  versicolor  blue
12          0.2      setosa  blue
13          1.2  versicolor   red
14          0.3      setosa  blue

I need to find the maximum petal width for each species only considering blue observations, and assign that maximum to a new column max_width for each species. So since there are three unique species, there will be three unique values in the max_width column (unless two or more of them just happen to be the same).

It would be easy to find the maximum for every species/color combination:

iris['max_width'] = iris.groupby(['species', 'color']).petal_width.transform('max')

And of course it would be easy to aggregate by species regardless of color. Is there a simple way of ignoring red cases but still assigning them the relevant maximum value of the blue observations for each species?

Solution

IIUC, You can try using map combined with groupby():

iris['max_width'] = iris['species'].map(iris[iris['color'] == 'blue'].groupby(['species']).petal_width.max().to_dict())

Sampling 15 rows, returns:

     petal_width     species color  max_width
86           1.5  versicolor   red        1.6
103          1.8   virginica  blue        2.5
122          2.0   virginica   red        2.5
102          2.1   virginica  blue        2.5
43           0.6      setosa   red        0.5
35           0.2      setosa  blue        0.5
11           0.2      setosa   red        0.5
20           0.2      setosa  blue        0.5
145          2.3   virginica  blue        2.5
85           1.6  versicolor   red        1.6
7            0.2      setosa   red        0.5
18           0.3      setosa  blue        0.5
147          2.0   virginica  blue        2.5
117          2.2   virginica  blue        2.5
63           1.4  versicolor   red        1.6

Answered By - Celius Stingher

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, September 3, 2022

[FIXED] pandas.DataFrame.groupby.transform only considering one group

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels