Issue
This is a follow-up on a question I posted that a member graciously provided the solution for. However, I had neglected to account for periodicity in my question.
I have a dataframe like so:
data = {'ID':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
'column1': [15, 16, 17, 14, 13, 5, 3, 2, 1.9, 1.2, 1, 0.8, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 2, 3, 4, 5, 6],
'column2': [10, 11, 12, 13, 13.5, 14, 14.5, 15, 16, 17, 18, 19, 20, 20, 20, 20, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
}
df=pd.DataFrame(data)
ID column1 column2
0 1 15.0 10.0
1 1 16.0 11.0
2 1 17.0 12.0
3 1 14.0 13.0
4 1 13.0 13.5
5 1 5.0 14.0
6 1 3.0 14.5
7 1 2.0 15.0
8 1 1.9 16.0
9 1 1.2 17.0
10 1 1.0 18.0
11 1 0.8 19.0
12 1 0.5 20.0
13 1 0.5 20.0
14 1 0.5 20.0
15 1 0.5 20.0
16 1 0.5 20.0
17 1 0.5 19.0
18 1 0.5 18.0
19 1 0.5 17.0
20 1 0.5 16.0
21 1 1.0 15.0
22 1 2.0 14.0
23 1 3.0 13.0
24 1 4.0 12.0
25 1 5.0 11.0
26 1 6.0 10.0
As can be seen, column1
decreases and then increases. Column 2
increases and then decreases.
I want to create a label column based on the following conditions:
- In
column1
, from the beginning until just before we first encounter a value<=2
, AND whencolumn2
is>13
, label asPre_Start
- When column 1 is decreasing AND
0.5 < column1 <= 2
AND column 2 is increasing AND< column2 <= 19
label asStart
- When
column1 <= 0.5
ANDcolumn2 >= 19
label asSteady
- When
column1 <= 0.5
AND column 2 is decreasing AND14 < column2 < 19
label asRamp
- When column 1 is increasing AND
column1 > 0.5
AND column 2 is decreasing ANDcolumn2 < 19
label asEnd
Im not sure if there is an ironclad boolean logic that would account for the conditions without looking at whether it is increasing or decreasing.
Using the method member Corralien provided, based on my question without addressing periodicity, I was able to get the resulting dataframe:
def label_group(df):
c1 = df['column1']
c2 = df['column2']
conds = [(0.5 < c1) & (c1 <= 2) & (c2 > 14) & (c2 <= 19), # Start
(c1 <= 0.5) & (c2 >= 19), # Steady
(c1 <= 0.5) & (14 < c2) & (c2 < 19), # Ramp
(c1 > 0.5) & (c2 < 19)] # End
choices = ['Start', 'Steady', 'Ramp', 'End']
labels = np.select(condlist=conds, choicelist=choices)
labels = pd.Series(labels, index=df.index)
labels[:((c1 <= 2) & (c2 > 13)).argmax()] = 'Pre_Start'
return labels.to_frame()
df['Label'] = df.groupby('ID', group_keys=True).apply(label_group).droplevel('ID')
ID column1 column2 Label
0 1 15.0 10.0 Pre_Start
1 1 16.0 11.0 Pre_Start
2 1 17.0 12.0 Pre_Start
3 1 14.0 13.0 Pre_Start
4 1 13.0 13.5 Pre_Start
5 1 5.0 14.0 Pre_Start
6 1 3.0 14.5 Pre_Start
7 1 2.0 15.0 Start
8 1 1.9 16.0 Start
9 1 1.2 17.0 Start
10 1 1.0 18.0 Start
11 1 0.8 19.0 Start
12 1 0.5 20.0 Steady
13 1 0.5 20.0 Steady
14 1 0.5 20.0 Steady
15 1 0.5 20.0 Steady
16 1 0.5 20.0 Steady
17 1 0.5 19.0 Steady
18 1 0.5 18.0 Ramp
19 1 0.5 17.0 Ramp
20 1 0.5 16.0 Ramp
21 1 1.0 15.0 Start
22 1 2.0 14.0 End
23 1 3.0 13.0 End
24 1 4.0 12.0 End
25 1 5.0 11.0 End
26 1 6.0 10.0 End
As can be seen, row 21 satisfies the start
condition, however. It should be labeled End
and state not satisfying any of the conditions labeled other
.
I tried using the shift()
function but to no avail.
Any help would be appreciated.
Solution
According your new conditions, you can use:
def label_group(df):
c1 = df['column1']
c2 = df['column2']
i1 = c1.diff().shift(-1, fill_value=0).ge(0) # is Increasing, column 1
d2 = c2.diff().shift(-1, fill_value=0).le(0) # is Decreasing, column 2
conds = [~i1 & (0.5 < c1) & (c1 <= 2) & ~d2 & (c2 > 14) & (c2 <= 19), # Start
(c1 <= 0.5) & (c2 >= 19), # Steady
(c1 <= 0.5) & d2 & (14 < c2) & (c2 < 19), # Ramp
i1 & (c1 > 0.5) & d2 & (c2 < 19)] # End
choices = ['Start', 'Steady', 'Ramp', 'End']
labels = np.select(condlist=conds, choicelist=choices)
labels = pd.Series(labels, index=df.index)
labels[:((c1 <= 2) & (c2 > 13)).argmax()] = 'Pre_Start'
return labels.to_frame()
df['Label'] = df.groupby('ID', group_keys=True).apply(label_group).droplevel('ID')
Output:
>>> df
ID column1 column2 Label
0 1 15.0 10.0 Pre_Start
1 1 16.0 11.0 Pre_Start
2 1 17.0 12.0 Pre_Start
3 1 14.0 13.0 Pre_Start
4 1 13.0 13.5 Pre_Start
5 1 5.0 14.0 Pre_Start
6 1 3.0 14.5 Pre_Start
7 1 2.0 15.0 Start
8 1 1.9 16.0 Start
9 1 1.2 17.0 Start
10 1 1.0 18.0 Start
11 1 0.8 19.0 Start
12 1 0.5 20.0 Steady
13 1 0.5 20.0 Steady
14 1 0.5 20.0 Steady
15 1 0.5 20.0 Steady
16 1 0.5 20.0 Steady
17 1 0.5 19.0 Steady
18 1 0.5 18.0 Ramp
19 1 0.5 17.0 Ramp
20 1 0.5 16.0 Ramp
21 1 1.0 15.0 End
22 1 2.0 14.0 End
23 1 3.0 13.0 End
24 1 4.0 12.0 End
25 1 5.0 11.0 End
26 1 6.0 10.0 End
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.