Issue
I have a simple table which the datetime is formatted correctly on.
Datetime | Diff |
---|---|
2021-01-01 12:00:00 | 0 |
2021-01-01 12:02:00 | 2 |
2021-01-01 12:04:00 | 2 |
2021-01-01 12:010:00 | 6 |
2021-01-01 12:020:00 | 10 |
2021-01-01 12:022:00 | 2 |
I would like to add a label/batch name which increases when a specific threshold/cutoff time is the difference. The output (with a threshold of diff > 7) I am hoping to achieve is:
Datetime | Diff | Batch |
---|---|---|
2021-01-01 12:00:00 | 0 | A |
2021-01-01 12:02:00 | 2 | A |
2021-01-01 12:04:00 | 2 | A |
2021-01-01 12:010:00 | 6 | A |
2021-01-01 12:020:00 | 10 | B |
2021-01-01 12:022:00 | 2 | B |
Batch doesn't need to be 'A','B','C' - probably easier to increase numerically.
I cannot find a solution online but I'm assuming there is a method to split the table on all values below the threshold, apply the batch label and concatenate again. However I cannot seem to get it working.
Any insight appreciated :)
Solution
You can use:
df['Batch'] = df['Datetime'].diff().dt.total_seconds().gt(7*60) \
.cumsum().add(65).apply(chr)
print(df)
# Output:
Datetime Diff Batch
0 2021-01-01 12:00:00 0 A
1 2021-01-01 12:02:00 2 A
2 2021-01-01 12:04:00 2 A
3 2021-01-01 12:10:00 6 A
4 2021-01-01 12:20:00 10 B
5 2021-01-01 12:22:00 2 B
Update
For a side question: apply(char) goes through A-Z, what method would you use to achieve AA, AB for batches greater than 26
Try something like this:
# Adapted from openpyxl
def chrext(i):
s = ''
while i > 0:
i, r = divmod(i, 26)
i, r = (i, r) if r > 0 else (i-1, 26)
s += chr(r-1+65)
return s[::-1]
df['Batch'] = df['Datetime'].diff().dt.total_seconds().gt(7*60) \
.cumsum().add(1).apply(chrext)
For demonstration purpose, if you replace 1 by 27:
>>> df
Datetime Diff Batch
0 2021-01-01 12:00:00 0 AA
1 2021-01-01 12:02:00 2 AA
2 2021-01-01 12:04:00 2 AA
3 2021-01-01 12:10:00 6 AA
4 2021-01-01 12:20:00 10 AB
5 2021-01-01 12:22:00 2 AB
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.