Issue
I need to clean up the rows of a df in order to calculate the time spent inside the building. Sometimes the reader has entered multiple entries or exits within a short space of time - obviously an error. The errors are not always duplicates, they may have a few seconds or minutes between them.
What is the most efficient way to clean this up before I can know how long was spent inside?
We can assume that entry and exit happens within one day. ie. no-one spends a night there.
Below is a sample of the dataframe.
Date | Type |
---|---|
2021-11-10 19:31:50 | Exit |
2021-11-10 19:31:50 | Exit |
2021-11-10 18:49:21 | Entry |
2021-11-09 20:14:21 | Exit |
2021-11-09 19:34:05 | Entry |
Edit:
Expected output would have clean/clear entry and exit times (let's say lasting more than 10 minutes inside?)
You cannot just delete called rows, let's say we don't know how many rows there are...
Date | Type |
---|---|
2021-11-10 19:31:50 | Exit |
2021-11-10 18:49:21 | Entry |
2021-11-09 20:14:21 | Exit |
2021-11-09 19:34:05 | Entry |
Solution
You could use shift
I created a dummy DataFrame
to match yours (just without the datetime
type):
df = pd.DataFrame({'Date': ['2021-11-10 19:31:50', '2021-11-10 19:31:50', '2021-11-10 18:49:21', '2021-11-09 20:14:21', '2021-11-09 19:34:05'],
'Type': ['Exit', 'Exit', 'Entry', 'Exit', 'Entry']})
and then used shift
:
df.loc[df['Type'].shift() != df['Type']]
output:
Answered By - ImSo3K
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.