Issue
I have a large dataset where I am interested in the part where it shuts down and when it is shut down. However, the data also includes data of the startup which I want to filter out.
The data goes down to <0.2, stays there for a while and then goes up again >0.2. I want to delete the part where it has been <0.2 before and is going up to >0.2.
I have used a standard filter, but since I am still interested in the first part this does not seem to work. Just looking at the derivative is also not an option since the value can go up and down in the beginning as well, the only difference with the latter part is that it has been <0.2 before.
How can I do this?
import pandas as pd
data = {
"Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20", "2020-06-07 00:21", "2020-06-07 00:22", "2020-06-07 00:23", "2020-06-07 00:24", "2020-06-07 00:25", "2020-06-07 00:26", "2020-06-07 00:27", "2020-06-07 00:28", "2020-06-07 00:29"],
"Value": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0.1, 0.05, 0.15, 0.1, 0.18, 0.25, 1, 4, 8, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0],
}
df = pd.DataFrame(data)
Required output:
data = {
"Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19"],
"Value": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0.1, 0.05, 0.15, 0.1, 0.18],
}
Solution
You can identify the switching points (above 0.2 to under and vice versa) using (df['Value'] < 0.2).diff()
and then use cumsum
. To remove any parts of the dataframe after the value has been below 0.2 for any period of time, simply remove any rows with a cumsum of 2 or more.
s = (df['Value'] < 0.2).diff().cumsum()
df.loc[s < 2]
Result:
Date and Time Value
1 2020-06-07 00:01 15.10
2 2020-06-07 00:02 13.80
3 2020-06-07 00:03 12.00
4 2020-06-07 00:04 11.90
5 2020-06-07 00:05 12.10
6 2020-06-07 00:06 10.80
7 2020-06-07 00:07 9.80
8 2020-06-07 00:08 8.30
9 2020-06-07 00:09 6.20
10 2020-06-07 00:10 4.30
11 2020-06-07 00:11 4.20
12 2020-06-07 00:12 4.20
13 2020-06-07 00:13 3.30
14 2020-06-07 00:14 1.80
15 2020-06-07 00:15 0.10
16 2020-06-07 00:16 0.05
17 2020-06-07 00:17 0.15
18 2020-06-07 00:18 0.10
19 2020-06-07 00:19 0.18
Answered By - Shaido
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.