Issue
Existing Dataframe :
Id Month Year scheduled completed
A Jan 2021 0 0
A Feb 2021 1 0
A mar 2021 0 0
B June 2021 0 1
B July 2021 0 1
B Aug 2021 0 1
B Sep 2021 0 1
C Nov 2021 1 0
C Dec 2021 1 0
C Jan 2022 1 0
C Feb 2022 1 0
Expected Dataframe :
Id status
A defaulter
B non_defaulter
C defaulter
I am trying to create a status tag for each basis their activity. if for three consecutive Month if completed column remains 0 , that Id is to be tagged as "defaulter" else "non_defaulter"
Solution
Idea is aggregate per consecutive groups of 0
values per Id
and helper Series s
created by Series.cumsum
, count True
s values, then because possible multiple groups per Id
aggregate max
and last set values in numpy.where
:
N = 3
m = df['completed'].ne(0)
#if need check in completed OR scheduled column
#m = df[['completed','scheduled']].ne(0).any(axis=1)
df = ((~m).groupby([df['Id'], m.cumsum().mask(m, -1)])
.sum()
.groupby(level=0)
.max()
.reset_index(name='status')
.assign(status = lambda x: np.where(x['status'].ge(N),
'defaulter','non_defaulter')))
print (df)
Id status
0 A defaulter
1 B non_defaulter
2 C defaulter
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.