Issue
I have a dataframe with 0 and 1 and I would like to count groups of 1s (don't mind the 0s) with a Pandas solution (not itertools, not python iteration).
Other SO posts suggest methods based on shift()
/diff()
/cumsum()
which seems not to work when the leading sequence in the dataframe starts with 0.
df = pandas.Series([0,1,1,1,0,0,1,0,1,1,0,1,1]) # should give 4
df = pandas.Series([1,1,0,0,1,0,1,1,0,1,1]) # should also give 4
df = pandas.Series([1,1,1,1,1,0,1]) # should give 2
Any idea ?
Solution
If you only have 0/1, you can use:
s = pd.Series([0,1,1,1,0,0,1,0,1,1,0,1,1])
count = s.diff().fillna(s).eq(1).sum()
output: 4
(4
and 2
for the other two)
Then fillna
ensures that Series starting with 1
will be counted
faster alternative
use the diff, count the 1 and correct the result with the first item:
count = s.diff().eq(1).sum()+(s.iloc[0]==1)
comparison of different pandas approaches:
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.