Issue
I have table df which contains logged in and logged out time of users
Login time | Logout time |
---|---|
2022-08-01 11:30:00 | 2022-08-01 11:50:00 |
2022-08-01 11:35:00 | 2022-08-01 11:55:00 |
2022-08-01 11:35:00 | 2022-08-01 11:57:00 |
I have another table df2 which contains datetimes when jobs are created
created time |
---|
2022-08-01 11:45:00 |
2022-08-01 11:51:00 |
2022-08-01 11:56:00 |
2022-08-01 11:57:00 |
2022-08-01 12:00:00 |
I am struggling to create a result df and would appreciate any help on how to create the resulting dataframe result_df
created time | Online users | Offline users |
---|---|---|
2022-08-01 11:45:00 | 3 | 0 |
2022-08-01 11:51:00 | 2 | 1 |
2022-08-01 11:56:00 | 1 | 2 |
2022-08-01 11:57:00 | 0 | 3 |
2022-08-01 12:00:00 | 0 | 3 |
Solution
here is one way to do it using pandassql
while it is possible to do with the pandas merge as well, but it will requires to create a Cartesian product of two DF, and then filtering out the rows that meet the criteria.
using pandasql, if one is familiar with SQL, makes it simpler to solve it
# https://pypi.org/project/pandasql/
pysqldf = lambda q: sqldf(q, globals())
# Query to select where the created time fall inbetween the login and logout
qry = """
select *
from df2
left join df
on df2.created_time between df.login_time and df.logout_time
"""
pysqldf = lambda q: sqldf(q, globals())
result=pysqldf(qry)
result #capture the result
# do a groupby to take the count of logged in users
df3=result.groupby(['created_time'])['Login_time'].agg(online_user='count').reset_index()
# logged out is the total number of users minus the logged in users
cnt=df['Login_time'].count()
df3['offline_user'] = cnt - df3['online_user']
df3
created_time online_user offline_user
0 2022-08-01 11:45:00 3 0
1 2022-08-01 11:51:00 2 1
2 2022-08-01 11:56:00 1 2
3 2022-08-01 11:57:00 1 2
4 2022-08-01 12:00:00 0 3
Answered By - Naveed
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.