Issue
Given a dataframe like the one below, for each date in Enrol Date how can I count the number of values in preceding rows in Close Date that are earlier? Ideally I would like to add the results as a new column.
Class | Enrol Date | Close Date |
---|---|---|
A | 30/10/2003 | 05/12/2003 |
A | 22/12/2003 | 23/09/2005 |
A | 06/09/2005 | 29/09/2005 |
A | 15/11/2005 | 07/12/2005 |
A | 27/02/2006 | 28/03/2006 |
Desired result:
Class | Enrol Date | Close Date | Prior Dates |
---|---|---|---|
A | 30/10/2003 | 05/12/2003 | 0 |
A | 22/12/2003 | 23/09/2005 | 1 |
A | 06/09/2005 | 29/09/2005 | 1 |
A | 15/11/2005 | 07/12/2005 | 3 |
A | 27/02/2006 | 28/03/2006 | 4 |
Solution
A possible option using triu
:
cols = ["Enrol Date", "Close Date"]
df[cols] = df[cols].apply(pd.to_datetime, dayfirst=True)
enrol = df["Enrol Date"].to_numpy()
close = df["Close Date"].to_numpy()[:, None]
df["Prior Dates"] = np.triu(enrol>close).sum(0)
Output :
print(df)
Class Enrol Date Close Date Prior Dates
0 A 2003-10-30 2003-12-05 0
1 A 2003-12-22 2005-09-23 1
2 A 2005-09-06 2005-09-29 1
3 A 2005-11-15 2005-12-07 3
4 A 2006-02-27 2006-03-28 4
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.