Issue
I'm trying to analyze a log file using Pandas. I want to plot three lines for the count of levels "ERROR", "INFO", and "WARN" per second. With x = date (seconds), y = count.
After importing my log file, my data frame looks like this:
df_logs
I floor the date per second:
df_logs['date'] = df_logs['date'].dt.floor('S')
Then I group by message level:
ds_grouped = df_logs.groupby(['date','level'])['level'].count()
From here, I'm completely stuck:
type(ds_grouped)
> pandas.core.frame.DataFrame
I guess the correct seaborn plot is:
sns.lineplot(x='date',
y='count',
hue='level',
data=ds_grouped)
How to plot the grouped data frame?
Solution
Here is a way to create the plot, IIUC:
# create test data
import numpy as np
import pandas as pd
n = 10_000
np.random.seed(123)
timestamps = pd.date_range(start='2020-08-27 09:00:00',
periods=60*60*4, freq='1s')
level = ['info', 'info', 'info', 'warn','warn', 'error']
df = pd.DataFrame(
{'timestamp': np.random.choice(timestamps, n),
'level': np.random.choice(level, n),})
print(df.head())
timestamp level
0 2020-08-27 09:59:42 info
1 2020-08-27 12:14:06 warn
2 2020-08-27 09:22:26 info
3 2020-08-27 12:24:12 error
4 2020-08-27 10:26:58 info
Second, sample in 5-minute intervals. You can change frequency in pd.Grouper
below:
t = (df.assign(counter = 1)
.set_index('timestamp')
.groupby([pd.Grouper(freq='5min'), 'level']).sum()
.squeeze()
.unstack())
print(t.head())
level error info warn
timestamp
2020-08-27 09:00:00 35 123 66
2020-08-27 09:05:00 32 91 73
2020-08-27 09:10:00 41 113 64
2020-08-27 09:15:00 32 110 66
2020-08-27 09:20:00 35 107 61
Third, create the plot with t.plot();
Answered By - jsmart
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.