Issue
My apologies, I feel that this is something rather basic, but I cannot de-construct the problem to understand what am I missing.
The Background:
Several users interact with a server. The dictionary contains usernames as keys; each key corresponds to a list with time stamps (exact time when a given user interacted with server). Example of a timestamp: "25-Jun-2012 01:44"
What I am trying to accomplish:
I want to create a scatter plot that contains usernames on y-axis (categorical values) and corresponding timestamps (%H:%M) on x-axis. Note, I am interested only in interaction time; my x-axis should be in 24-hr format, i.e., from 00:00 to 24:00.
Minimal Working example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# * Source dictionary:
source_data = {"user_1": ['24-Nov-2023 10:30', '24-Nov-2023 10:15', '24-Nov-2023 09:55', '24-Nov-2023 22:10', '24-Nov-2023 15:55', '24-Nov-2023 11:15', '24-Nov-2023 09:30', '24-Nov-2023 22:25', '24-Nov-2023 17:20', '24-Nov-2023 23:55'], "user_2": ['24-Nov-2023 11:30', '24-Nov-2023 11:15', '24-Nov-2023 08:55', '24-Nov-2023 23:10', '24-Nov-2023 15:45', '24-Nov-2023 12:15', '24-Nov-2023 07:30', '24-Nov-2023 23:25', '24-Nov-2023 17:30', '24-Nov-2023 21:55']}
# Convert the dictionary into the data frame; convert strings into time:
source_df = pd.DataFrame.from_dict(source_data)
source_df = source_df.apply(pd.to_datetime)
# print(source_df)
fig, ax = plt.subplots(figsize=(6,9))
# This is the part I do not understand:
ax.scatter(x=?, y=?, marker='.', color='b')
ax.xaxis.set_major_locator(mdates.HourLocator(byhour=range(0, 24, 1)))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M hr'))
fig.tight_layout()
plt.show()
Questions:
- pandas converts columns into datetime %Y-%m-%d %H:%M:%S, but I am only interested in %H:%M ? Can I clean these datetime object somehow?
- How to plot all timestamps in every column, using the column names as category variables ?
Solution
As far as I'm concerned, it will be less challenging to achieve the expected output if we change the source df by melting it.
source_df_melted = source_df.melt()
fig, ax = plt.subplots(figsize=(10,3))
ax.scatter(x=source_df_melted.value, y=source_df_melted.variable, marker='.', color='b')
ax.xaxis.set_major_locator(mdates.HourLocator(byhour=range(0, 24, 3)))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
fig.tight_layout()
plt.show()
Answering your other question, pandas.to_datetime()
by default returns %Y-%m-%d %H:%M:%S
format, but you can interact with this datatype with attributes like dt.hour
, dt.day
etc.
If you need to return hours and minutes (and seconds), you'd want to run:
source_df_melted['TIME'] = source_df_melted.value.dt.time
This returns HH:MM:SS
format of each datetime record.
Answered By - Niqua
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.