Saturday, January 20, 2024

[FIXED] Find mode for each group in pandas DataFrame

January 20, 2024 dataframe, group-by, pandas No comments

Issue

I have a dataframe with the following 3 columns: property, value, count

Pairs of (property, value) are unique.

I need to add a new column to this dataframe, that will check if the value in count is the most frequent within its property column value. So, for example:

We have a dataframe

property,value,count
p1,v1,20
p1,v2,50
p1,v3,50
p2,v4,10
p2,v5,20

And as a result we need to get:

property,value,count,is_mode
p1,v1,20,False
p1,v2,50,True
p1,v3,50,True
p2,v4,10,True
p2,v5,20,False

in case when there are multiple modes I need to use the minimum one for calculating the is_mode value. So here, for p2 there are modes - 10, 20, but since I need the minimum one, for 10 is_mode will be True and for 20 it will be False

I've found numerous solutions from stackoverflow already, but these didn't lead me to any good result. For some reason

I tried to split the task into first getting the mode for each property and then somehow compare the values to the minimum mode for this group:

new_df = df.groupby(["property"])["count"].agg(pd.Series.mode)

same if I try to attach it to the current dataframe:

properties["mode"] = properties.groupby(["property"])["count"].transform(pd.Series.mode)

these doesn't crush however they produce weird results with lots of Nans. For example, I have:

property,value,count
p1,v1,200
p1,v2,60
p1,v3,60

and expect to get:

property,value,count,mode
p1,v1,200,60
p1,v2,60,60
p1,v3,60,60

but in my case I get:

property,value,count,mode
p1,v1,200,NaN
p1,v2,60,NaN
p1,v3,60,NaN

Solution

Assuming you want the smallest mode of the counts per group, use a custom function and groupby.transform.

For each group, get the mode (one or more values), then the min, flag values that are equal to it:

df['is_mode'] = (df.groupby('property')['count']
                   .transform(lambda s: s.eq(s.mode().min()))
                )

Output:

  property value  count  is_mode
0       p1    v1     20    False
1       p1    v2     50     True
2       p1    v3     50     True
3       p2    v4     10     True
4       p2    v5     20    False

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, January 20, 2024

[FIXED] Find mode for each group in pandas DataFrame

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels