Issue
I have the following large dataset recording the result of a math competition among students in descending order of date: So for example, student 1 comes third in Race 1 while student 3 won Race 2, etc.
Race_ID Date adv C_k
1 1/1/2023 2.5 2.7
1 1/1/2023 1.4 2.6
1 1/1/2023 1.3 1.9
1 1/1/2023 1.1 1.2
2 11/9/2022 1.4 1.1
2 11/9/2022 1.3 1.2
2 11/9/2022 1.0 0.4
3 17/4/2022 0.9 0.2
3 17/4/2022 0.8 0.4
3 17/4/2022 0.7 0.5
3 17/4/2022 0.6 0.2
3 17/4/2022 0.5 0.4
That is grouped by Group_ID
, the values in adv
is sorted in descending value by groups. For each group, I want to define a value
t= min{n| adv_(n+1) <= C_n} and C_t be the corresponding value in the C_k column.
In words, I want to create a new column C_t
for each group, where the value is equal to the first C_k such that C_k is greater or equal to the adv
value in the next row. And if no such value exists, then set C_t = 1 (for example, in Race_ID = 3)
So the desired column looks like this:
Race_ID Date adv C_k C_t
1 1/1/2023 2.5 2.7 1.9
1 1/1/2023 1.4 2.6 1.9
1 1/1/2023 1.3 1.9 1.9
1 1/1/2023 1.1 1.2 1.9
2 11/9/2022 1.4 1.1 1.2
2 11/9/2022 1.3 1.2 1.2
2 11/9/2022 1.0 0.4 1.2
3 17/4/2022 0.9 0.2 1.0
3 17/4/2022 0.8 0.4 1.0
3 17/4/2022 0.7 0.5 1.0
3 17/4/2022 0.6 0.2 1.0
3 17/4/2022 0.5 0.4 1.0
Thanks in advance.
Solution
Compare shifted values per groups by DataFrameGroupBy.shift
with Series.le
, filter by boolean indexing
, remove duplicates with keep last value by DataFrame.drop_duplicates
and mapping by Series.map
:
s = (df[df.groupby('Race_ID')['adv'].shift(-1).le(df['C_k'])]
.drop_duplicates(subset=['Race_ID'], keep='last')
.set_index('Race_ID')['C_k'])
df['C_t'] = df['Race_ID'].map(s).fillna(1)
print (df)
Race_ID Date adv C_k C_t
0 1 1/1/2023 2.5 2.7 1.9
1 1 1/1/2023 1.4 2.6 1.9
2 1 1/1/2023 1.3 1.9 1.9
3 1 1/1/2023 1.1 1.2 1.9
4 2 11/9/2022 1.4 1.1 1.2
5 2 11/9/2022 1.3 1.2 1.2
6 2 11/9/2022 1.0 0.4 1.2
7 3 17/4/2022 0.9 0.2 1.0
8 3 17/4/2022 0.8 0.4 1.0
9 3 17/4/2022 0.7 0.5 1.0
10 3 17/4/2022 0.6 0.2 1.0
11 3 17/4/2022 0.5 0.4 1.0
Or use Series.where
for set NaN
s for not matched values and get last non missing value per group by GroupBy.transform
with last
:
df['C_t'] = (df['C_k'].where(df.groupby('Race_ID')['adv'].shift(-1).le(df['C_k']))
.groupby(df['Race_ID'])
.transform('last')
.fillna(1))
print (df)
Race_ID Date adv C_k C_t
0 1 1/1/2023 2.5 2.7 1.9
1 1 1/1/2023 1.4 2.6 1.9
2 1 1/1/2023 1.3 1.9 1.9
3 1 1/1/2023 1.1 1.2 1.9
4 2 11/9/2022 1.4 1.1 1.2
5 2 11/9/2022 1.3 1.2 1.2
6 2 11/9/2022 1.0 0.4 1.2
7 3 17/4/2022 0.9 0.2 1.0
8 3 17/4/2022 0.8 0.4 1.0
9 3 17/4/2022 0.7 0.5 1.0
10 3 17/4/2022 0.6 0.2 1.0
11 3 17/4/2022 0.5 0.4 1.0
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.