Issue
I have an indexed pandas dataframe with values like so:
data = {'ID':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
'column1': [15, 16, 17, 14, 13, 5, 3, 2, 1.9, 1.2, 1, 0.8, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 2, 3, 4, 5, 6],
'column2': [10, 11, 12, 13, 13.5, 14, 14.5, 15, 16, 17, 18, 19, 20, 20, 20, 20, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
}
ID column1 column2
0 1 15.0 10.0
1 1 16.0 11.0
2 1 17.0 12.0
3 1 14.0 13.0
4 1 13.0 13.5
5 1 5.0 14.0
6 1 3.0 14.5
7 1 2.0 15.0
8 1 1.9 16.0
9 1 1.2 17.0
10 1 1.0 18.0
11 1 0.8 19.0
12 1 0.5 20.0
13 1 0.5 20.0
14 1 0.5 20.0
15 1 0.5 20.0
16 1 0.5 20.0
17 1 0.5 19.0
18 1 0.5 18.0
19 1 0.5 17.0
20 1 0.5 16.0
21 1 1.0 15.0
22 1 2.0 14.0
23 1 3.0 13.0
24 1 4.0 12.0
25 1 5.0 11.0
26 1 6.0 10.0
I want to label the rows based on the following conditions:
- In
column1
, from the beginning until just before we first encounter a value<=2
, AND whencolumn2
is>13
, label asPre_Start
- When
0.5 < column1 <= 2
AND< column2 <= 19
label asStart
- When
column1 <= 0.5
ANDcolumn2 >= 19
label asSteady
- When
column1 <= 0.5
AND14 < column2 < 19
label asRamp
- When
column1 > 0.5
ANDcolumn2 < 19
label asEnd
Such that the resulting dataframe would be something like this:
ID column1 column2 Label
0 1 15.0 10.0 Pre_Start
1 1 16.0 11.0 Pre_Start
2 1 17.0 12.0 Pre_Start
3 1 14.0 13.0 Pre_Start
4 1 13.0 13.5 Pre_Start
5 1 5.0 14.0 Pre_Start
6 1 3.0 14.5 Pre_Start
7 1 2.0 15.0 Start
8 1 1.9 16.0 Start
9 1 1.2 17.0 Start
10 1 1.0 18.0 Start
11 1 0.8 19.0 Start
12 1 0.5 20.0 Steady
13 1 0.5 20.0 Steady
14 1 0.5 20.0 Steady
15 1 0.5 20.0 Steady
16 1 0.5 20.0 Steady
17 1 0.5 19.0 Ramp
18 1 0.5 18.0 Ramp
19 1 0.5 17.0 Ramp
20 1 0.5 16.0 Ramp
21 1 1.0 15.0 End
22 1 2.0 14.0 End
23 1 3.0 13.0 End
24 1 4.0 12.0 End
25 1 5.0 11.0 End
26 1 6.0 10.0 End
I am able to successfully apply a grouped filter by defining a function start index using numpy
like
def filter_group(group):
start_index = np.argmax(group['column1'].values <= 2)
return group.iloc[start_index:]
and calling the function using the apply
clause:
filtered_df = df.groupby('ID', group_keys=False).apply(filter_group).reset_index(drop=True)
However, I can't get it to work with multiple conditions. Is there a more elegant way to achieve this?
Solution
You can use np.select
to define labels and then modify them to detect the Pre_Start state:
def label_group(df):
c1 = df['column1']
c2 = df['column2']
conds = [(0.5 <= c1) & (c1 <= 2) & (c2 > 14), # Start
(c1 < 0.5) & (c2 > 19), # Steady
(c1 < 0.5) & (14 <= c2) & (c2 <= 19), # Ramp
(c1 > 0.5) & (c2 < 14)] # End
choices = ['Start', 'Seady', 'Ramp', 'End']
labels = np.select(condlist=conds, choicelist=choices)
labels = pd.Series(labels, index=df.index)
labels[:((c1 <= 2) & (c2 > 13)).argmax()] = 'Pre_Start'
return labels.to_frame()
df['Label'] = df.groupby('ID').apply(label_group).droplevel('ID')
Note: conditions are not clear, maybe you can check them.
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.