Issue
I have performed outlier detection on some entrance sensor data for a shopping mall. I want create one plot for each entrance and highlight the observations that are outliers (which are marked by True in the outlier column in the dataframe).
Here is a small snippet of the data for two entrances and a time span of six days:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({"date": [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6],
"mall": ["Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1"],
"entrance": ["West", "West","West","West","West", "West", "East", "East", "East", "East", "East", "East"],
"in": [132, 140, 163, 142, 133, 150, 240, 250, 233, 234, 2000, 222],
"outlier": [False, False, False, False, False, False, False, False, False, False, True, False]})
In order to create several plots (there are twenty entrances in the full data), I have come across lmplot in seaborn.
sns.set_theme(style="darkgrid")
for i, group in df.groupby('entrance'):
sns.lmplot(x="date", y="in", data=group, fit_reg=False, hue = "entrance")
#pseudo code
#for the rows that have an outlier (outlier = True) create a red dot for that observation
plt.show()
There are two things I would like to accomplish here:
- Lineplot instead of scatterplot. I have not been successful in using sns.lineplot for creating separate plots for each entrance, as it seems lmplot is more fit for this.
- For each entrance plot, I would like show which of the observations that are outliers, preferably as a red dot. I have tried writing some pseudo code in my plotting attempts.
Solution
seaborn.lmplot
is aFacetgrid
, which I think is more difficult to use, in this case.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
for i, group in df.groupby(['entrance']):
# plot all the values as a lineplot
sns.lineplot(x="date", y="in", data=group)
# select the data when outlier is True and plot it
data_t = group[group.outlier == True]
sns.scatterplot(x="date", y="in", data=data_t, c=['r'])
# add a title using the value from the groupby
plt.title(f'Entrance: {i}')
# show the plot here, not outside the loop
plt.show()
Alternate option
- This option will allow for setting the number of columns and rows of a figure
import math
# specify the number of columns to plot
ncols = 2
# determine the number of rows, even if there's an odd number of unique entrances
nrows = math.ceil(len(df.entrance.unique()) / ncols)
fig, axes = plt.subplots(ncols=ncols, nrows=nrows, figsize=(16, 16))
# extract the axes into an nx1 array, which is easier to index with idx.
axes = axes.ravel()
for idx, (i, group) in enumerate(df.groupby(['entrance'])):
# plot all the values as a lineplot
sns.lineplot(x="date", y="in", data=group, ax=axes[idx])
# select the data when outlier is True and plot it
data_t = group[group.outlier == True]
sns.scatterplot(x="date", y="in", data=data_t, c=['r'], ax=axes[idx])
axes[idx].set_title(f'Entrance: {i}')
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.