Issue
im trying to count duplicates and then drop.cuplicates to get dataframe without duplicates but heve extra col in which we have number that represents how many duplicates there were
from
id letter
0 665639 AAAAAA
1 665639 AAAAAA
2 665639 AAAAAA
3 665639 AAAAAA
5 702090 BBBBBB
6 702092 CCCCCC
to
id letter count
0 665639 AAAAAA 4
1 665639 AAAAAA 4
2 665639 AAAAAA 4
3 665639 AAAAAA 4
5 702090 BBBBBB 1
6 702092 CCCCCC 1
and after
df.drop.duplicates(subset=['id'] ,inplace=True)
id letter count
0 665639 AAAAAA 4
5 702090 BBBBBB 1
6 702092 CCCCCC 1
my approach was:
cols = ['id','letter']
df['count'] = df.groupby(cols)['id'].transform('size')
but when im doing this i get dataframe that has only 1's in count
Solution
I would first create the count of duplicates
df['Count'] = 1
df.groupby(['id','letter']).Count.count().reset_index()
And then drop the duplicates
df.drop_duplicates()
Answered By - DanCor
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.