Thursday, October 28, 2021

[FIXED] Delete rows above certain value once number is reached

October 28, 2021 pandas, python No comments

Issue

I have a large dataset where I am interested in the part where it shuts down and when it is shut down. However, the data also includes data of the startup which I want to filter out.

The data goes down to <0.2, stays there for a while and then goes up again >0.2. I want to delete the part where it has been <0.2 before and is going up to >0.2.

I have used a standard filter, but since I am still interested in the first part this does not seem to work. Just looking at the derivative is also not an option since the value can go up and down in the beginning as well, the only difference with the latter part is that it has been <0.2 before.

How can I do this?

import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20", "2020-06-07 00:21", "2020-06-07 00:22", "2020-06-07 00:23", "2020-06-07 00:24", "2020-06-07 00:25", "2020-06-07 00:26", "2020-06-07 00:27", "2020-06-07 00:28", "2020-06-07 00:29"],
  "Value": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0.1, 0.05, 0.15, 0.1, 0.18, 0.25, 1, 4, 8, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0],
}

df = pd.DataFrame(data)

Required output:

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19"],
  "Value": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0.1, 0.05, 0.15, 0.1, 0.18],
}

Solution

You can identify the switching points (above 0.2 to under and vice versa) using (df['Value'] < 0.2).diff() and then use cumsum. To remove any parts of the dataframe after the value has been below 0.2 for any period of time, simply remove any rows with a cumsum of 2 or more.

s = (df['Value'] < 0.2).diff().cumsum()
df.loc[s < 2]

Result:

       Date and Time    Value
1   2020-06-07 00:01    15.10
2   2020-06-07 00:02    13.80
3   2020-06-07 00:03    12.00
4   2020-06-07 00:04    11.90
5   2020-06-07 00:05    12.10
6   2020-06-07 00:06    10.80
7   2020-06-07 00:07     9.80
8   2020-06-07 00:08     8.30
9   2020-06-07 00:09     6.20
10  2020-06-07 00:10     4.30
11  2020-06-07 00:11     4.20
12  2020-06-07 00:12     4.20
13  2020-06-07 00:13     3.30
14  2020-06-07 00:14     1.80
15  2020-06-07 00:15     0.10
16  2020-06-07 00:16     0.05
17  2020-06-07 00:17     0.15
18  2020-06-07 00:18     0.10
19  2020-06-07 00:19     0.18

Answered By - Shaido

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, October 28, 2021

[FIXED] Delete rows above certain value once number is reached

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels