Issue
I have a dataframe that is in descending order of time and indexed by Race_ID
and Student_ID
.
Race_ID Date Student_ID Mark
1 1/10/2023 1 5
1 1/10/2023 2 8
1 1/10/2023 3 7
8 1/10/2023 4 4
8 1/1/2023 1 9
8 1/1/2023 2 3
8 1/1/2023 3 5
8 1/1/2023 4 10
2 11/9/2022 1 2
2 11/9/2022 2 4
2 11/9/2022 3 9
3 17/4/2022 5 3
3 17/4/2022 2 4
3 17/4/2022 3 3
3 17/4/2022 4 7
4 1/3/2022 1 4
4 1/3/2022 2 9
5 1/1/2021 1 6
5 1/1/2021 2 1
5 1/1/2021 3 8
I want to create a new column Seasonal_Mark
which is the cumsum
of Mark
by each student, with the caveat that the sum is set to zero on every 1 February. So the desired outcome looks like:
Race_ID Date Student_ID Mark Seasonal_Mark
1 1/10/2023 1 5 5
1 1/10/2023 2 8 8
1 1/10/2023 3 7 7
8 1/10/2023 4 4 4
8 1/1/2023 1 9 15 (4+2+9)
8 1/1/2023 2 3 20 (4+9+4+3)
8 1/1/2023 3 5 17 (9+3+5)
8 1/1/2023 4 10 17 (7+10)
2 11/9/2022 1 2 6 (4+2)
2 11/9/2022 2 4 17 (4+9+4)
2 11/9/2022 3 9 12 (9+3)
3 17/4/2022 5 3 3
3 17/4/2022 2 4 13 (4+9)
3 17/4/2022 3 3 3
3 17/4/2022 4 7 7
4 1/3/2022 1 4 4
4 1/3/2022 2 9 9
5 1/1/2021 1 6 6
5 1/1/2021 2 1 1
5 1/1/2021 3 8 8
Thank you so much in advance.
Solution
Another possible solution, which creates an adjusted year column that corresponds to the year of the date, if the date is February or later (in the year), and to the year minus 1, if the date is earlier than February (in the year). This new column is afterwards used inside groupby
.
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
m = df['Date'].dt.month
y = df['Date'].dt.year
df.assign(
Seasonal_Mark = (df.assign(
adjusted_year = np.where(m >= 2, y, y - 1))
.sort_values(by='Date')
.groupby(['Student_ID', 'adjusted_year'])['Mark'].cumsum()))
Output:
Race_ID Date Student_ID Mark Seasonal_Mark
0 1 2023-10-01 1 5 5
1 1 2023-10-01 2 8 8
2 1 2023-10-01 3 7 7
3 8 2023-10-01 4 4 4
4 8 2023-01-01 1 9 15
5 8 2023-01-01 2 3 20
6 8 2023-01-01 3 5 17
7 8 2023-01-01 4 10 17
8 2 2022-09-11 1 2 6
9 2 2022-09-11 2 4 17
10 2 2022-09-11 3 9 12
11 3 2022-04-17 5 3 3
12 3 2022-04-17 2 4 13
13 3 2022-04-17 3 3 3
14 3 2022-04-17 4 7 7
15 4 2022-03-01 1 4 4
16 4 2022-03-01 2 9 9
17 5 2021-01-01 1 6 6
18 5 2021-01-01 2 1 1
19 5 2021-01-01 3 8 8
Answered By - PaulS
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.