Issue
I have a dataframe with the following 3 columns: property, value, count
Pairs of (property, value) are unique.
I need to add a new column to this dataframe, that will check if the value in count is the most frequent within its property column value. So, for example:
We have a dataframe
property,value,count
p1,v1,20
p1,v2,50
p1,v3,50
p2,v4,10
p2,v5,20
And as a result we need to get:
property,value,count,is_mode
p1,v1,20,False
p1,v2,50,True
p1,v3,50,True
p2,v4,10,True
p2,v5,20,False
- in case when there are multiple modes I need to use the minimum one for calculating the is_mode value. So here, for p2 there are modes - 10, 20, but since I need the minimum one, for 10 is_mode will be True and for 20 it will be False
I've found numerous solutions from stackoverflow already, but these didn't lead me to any good result. For some reason
I tried to split the task into first getting the mode for each property and then somehow compare the values to the minimum mode for this group:
new_df = df.groupby(["property"])["count"].agg(pd.Series.mode)
same if I try to attach it to the current dataframe:
properties["mode"] = properties.groupby(["property"])["count"].transform(pd.Series.mode)
these doesn't crush however they produce weird results with lots of Nans. For example, I have:
property,value,count
p1,v1,200
p1,v2,60
p1,v3,60
and expect to get:
property,value,count,mode
p1,v1,200,60
p1,v2,60,60
p1,v3,60,60
but in my case I get:
property,value,count,mode
p1,v1,200,NaN
p1,v2,60,NaN
p1,v3,60,NaN
Solution
Assuming you want the smallest mode of the counts per group, use a custom function and groupby.transform
.
For each group, get the mode
(one or more values), then the min
, flag values that are equal to it:
df['is_mode'] = (df.groupby('property')['count']
.transform(lambda s: s.eq(s.mode().min()))
)
Output:
property value count is_mode
0 p1 v1 20 False
1 p1 v2 50 True
2 p1 v3 50 True
3 p2 v4 10 True
4 p2 v5 20 False
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.