Friday, March 25, 2022

[FIXED] Counting the running app from start and finish datetime column in Dataframe

March 25, 2022 matplotlib, pandas, python No comments

Issue

I have a dataframe like this

df = pd.DataFrame({
'app': [1,2,3,4,5],
'start_time': ['2022-03-11 22:26:00', '2022-03-11 22:26:30', '2022-03-11 22:27:00', '2022-03-11 22:27:30', '2022-03-11 22:28:00'],
'finish_time': ['2022-03-11 22:26:40', '2022-03-11 22:27:00', '2022-03-11 22:28:00', '2022-03-11 22:27:40', '2022-03-11 22:29:00']
})

df['start_time']=pd.to_datetime(df['start_time'])
df['finish_time']=pd.to_datetime(df['finish_time'])

My main purpose is to create a plot x-axis is time, y-axis is the count of running app

By that way, my idea is to create new column which is equal the running app when the app is started. For example in this case when the app 2 start actually app 1 is still running (it is fine if app 2 is included in the counting process), but I am stuck here (this is the example of the dataframe that I intended to make)

    app start_time  finish_time  running_apps(if current app included)
0   1   2022-03-11 22:26:00 2022-03-11 22:26:40 1
1   2   2022-03-11 22:26:30 2022-03-11 22:27:00 2
2   3   2022-03-11 22:27:00 2022-03-11 22:28:00 2
3   4   2022-03-11 22:27:30 2022-03-11 22:27:40 2
4   5   2022-03-11 22:28:00 2022-03-11 22:29:00 2

if someone else has another idea, it would be appreciated, thank you

Solution

You can use numpy broadcasting with np.tril for lower triangle for test next datetimes, chain bot hmask and count Trues by sum:

df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])

a = np.tril(df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = np.tril(df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])

df['count'] = (a & b).sum(axis=1)
print (df)
   app          start_time         finish_time  count
0    1 2022-03-11 22:26:00 2022-03-11 22:26:40      1
1    2 2022-03-11 22:26:30 2022-03-11 22:27:00      2
2    3 2022-03-11 22:27:00 2022-03-11 22:28:00      1
3    4 2022-03-11 22:27:30 2022-03-11 22:27:40      2
4    5 2022-03-11 22:28:00 2022-03-11 22:29:00      1

Or if need compare between all values:

df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])

a = (df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = (df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])

df['count'] = (a & b).sum(axis=1)
print (df)
   app          start_time         finish_time  count
0    1 2022-03-11 22:26:00 2022-03-11 22:26:40      2
1    2 2022-03-11 22:26:30 2022-03-11 22:27:00      2
2    3 2022-03-11 22:27:00 2022-03-11 22:28:00      2
3    4 2022-03-11 22:27:30 2022-03-11 22:27:40      2
4    5 2022-03-11 22:28:00 2022-03-11 22:29:00      1

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, March 25, 2022

[FIXED] Counting the running app from start and finish datetime column in Dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels