Issue
I know that we can get normalized values from value_counts()
of a pandas series but when we do a group by on a dataframe, the only way to get counts is through size()
. Is there any way to get normalized values with size()?
Example:
df = pd.DataFrame({'subset_product':['A','A','A','B','B','C','C'],
'subset_close':[1,1,0,1,1,1,0]})
df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
df.subset_product.value_counts()
A 3
B 2
C 2
df2
Looking to get:
subset_product subset_close prod_count norm
A 0 1 1/3
A 1 2 2/3
B 1 2 2/2
C 1 1 1/2
C 0 1 1/2
subset_product Besides manually calculating the normalized values as prod_count/total, is there any way to get normalized values?
Solution
I think it is not possible only one groupby
+ size
because groupby
by 2 columns subset_product
and subset_close
and need size
by subset_product
only for normalize.
Possible solutions are map
or transform
for Series
with same size as df2
with div
:
df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
s = df.subset_product.value_counts()
df2['prod_count'] = df2['prod_count'].div(df2['subset_product'].map(s))
Or:
df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
a = df2.groupby('subset_product')['prod_count'].transform('sum')
df2['prod_count'] = df2['prod_count'].div(a)
print (df2)
subset_product subset_close prod_count
0 A 0 0.333333
1 A 1 0.666667
2 B 1 1.000000
3 C 0 0.500000
4 C 1 0.500000
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.