Issue
I try to get the count of the consecutive rows in an extra column.
Example of what I would like to have:
class consecutive_count
a 3
a 3
a 3
b 2
b 2
c 1
d 1
e 3
e 3
e 3
f 1
a 1
c 1
d 2
d 2
Best try is:
df['consecutive_count'] = df.groupby('class')['class'].transform('count')
Which gives the total count in de complete dataframe but does not count consecutive rows but total:
class consecutive_count
a 4
a 4
a 4
b 2
b 2
c 2
d 3
e 3
e 3
e 3
f 1
a 4
c 2
d 3
d 3
- Using cumsum:
df['consecutive-count'] = (df['class'] != df['class'].shift()).cumsum()
Which kind of groups the consecutive rows:
class consecutive_count
a 1
a 1
a 1
b 2
b 2
c 3
d 4
e 5
e 5
e 5
f 6
a 7
c 8
d 9
d 9
I really have no clue how to solve this.
Solution
You're close with your second attempt.
df['consecutive_count'] = df.groupby(['class', (df['class']!=df['class'].shift()).cumsum()]).transform('size')
class consecutive_count
0 a 3
1 a 3
2 a 3
3 b 2
4 b 2
5 c 1
6 d 1
7 e 3
8 e 3
9 e 3
10 f 1
11 a 1
12 c 1
13 d 2
14 d 2
Answered By - amance
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.