Issue
I am trying to generate an appropriate pandas groupBy
Say I have a boolean mask like so [false, false, true, false, true, true, false, true, true]
I would like the groupings to be like so [0,0,1,0,2,2,0,3,3]
I can certainly create this array via a loop through the mask but I would like if possible to use the pandas or numpy builtins for ease of use and perhaps vectorization.
(If no builtin exists I would appreciate a more pythonic way of doing this than via a straight loop with a state flag and rank counter)
Solution
Original Answer:
l = [False, False, True, False, True, True, False, True, True]
s = pd.Series(l)
d = s.astype(int).diff().ne(0).cumsum()
d.loc[~(s)] = 0
pd.factorize(d)[0].tolist()
Slightly Modified Original Answer (works if first item is True):
l = [False, False, True, False, True, True, False, True, True]
s = pd.Series(l)
d = s.astype(int).diff().ne(0).cumsum()
d.loc[~(s)] = 0
dsort = d.sort_values()
dindex = dsort.index
pd.Series(pd.factorize(dsort)[0],index = dindex).sort_index().tolist()
Alternative way:
Generate List and put into series.
l = [False, False, True, False, True, True, False, True, True]
s = pd.Series(l)
Find items that are sequential
d = s.astype(int).diff().ne(0).cumsum().reset_index()
Locate the first True
in each group
d.loc[s].groupby(0)['index'].first().rename_axis(None)
Factorize new grouping and put into series
f = pd.factorize(d.loc[s].groupby(0)['index'].first().rename_axis(None))
s2 = pd.Series(f[0]+1,index = f[1])
Use reindex and forward fill all the missing spaces. Fill any NaN's with 0. Lastly replace all places that were False
with zeros.
s2 = s2.reindex(s.index).fillna(method='ffill').fillna(0)
s2.loc[~(s)] = 0
s2.tolist()
Answered By - rhug123
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.