Issue
I have a dataframe:
> df = batch Code. time
> a 100. 2019-08-01 00:59:12.000
> a 120. 2019-08-01 00:59:32.000
> a 130. 2019-08-01 00:59:42.000
> a 120. 2019-08-01 00:59:52.000
> b 100. 2019-08-01 00:44:11.000
> b 140. 2019-08-02 00:14:11.000
> b 150. 2019-08-03 00:47:11.000
> c 150. 2019-09-01 00:44:11.000
> d 100. 2019-08-01 00:10:00.000
> d 100. 2019-08-01 00:10:05.000
> d 130. 2019-08-01 00:10:10.000
> d 130. 2019-08-01 00:10:20.000
I want to get the number of seconds, per group, between the time of the first '100' code to the last '130' code. If for a group there is no code 100 with code 130 after (one of them is missing) - put nan. So the output should be:
df2 = batch duration
a 30
b. nan
c. nan
d. 20
What is the best way to do it?
Solution
Use:
#convert values to datetimes
df['time'] = pd.to_datetime(df['time'])
#get first 100 Code per batch
s1=df[df['Code.'].eq(100)].drop_duplicates('batch').set_index('batch')['time']
#get last 130 Code per batch
s2=df[df['Code.'].eq(130)].drop_duplicates('batch', keep='last').set_index('batch')['time']
#subtract and convert to timedeltas
df = (s2.sub(s1)
.dt.total_seconds()
.reindex(df['batch'].unique())
.reset_index(name='duration'))
print (df)
batch duration
0 a 30.0
1 b NaN
2 c NaN
3 d 20.0
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.