Issue
Using pandas boxplot(by="name_of_week_day"), I'd like to control the order of names of week days display on the plot. They are displayed in a wrong order, and I want them to be displayed in the order of the week, beginning by Monday and finishing by Sunday.
Here is a simple reproductible example:
import pandas as pd
#create DataFrame
df = (
# we create a 2 columns df: date and sales
pd.DataFrame({'date': pd.date_range(start='1/5/2022', freq='D', periods=15),
'sales': [6, 8, 9, 5, 4, 8, 8, 3, 5, 9, 8, 3, 4, 7, 7]})
# we create a new column to get the name of the day of the week
.assign(name_of_day = lambda df: df.date.dt.day_name())
)
The above code delivers a df. Here are its first 3 rows:
date sales name_of_day
0 2022-01-05 6 Wednesday
1 2022-01-06 8 Thursday
2 2022-01-07 9 Friday
Now plot the boxplot with:
df.boxplot(by="name_of_day");
It returns the plot:
I would like the plot to deliver the names of the days of week in the right order.
How to do with pandas.boxplot() or with (pandas.plot.box() ?
Nota: yes, we could do it with seaborn, but my question is about pandas boxplot() or pandas plot.box().
Solution
Use a CategoricalDtype
to reorder the days:
from calendar import day_name
days = pd.CategoricalDtype(list(day_name), ordered=True)
df.astype({'name_of_day': days}).boxplot(by="name_of_day")
Or in your original pipeline:
df = (
# we create a 2 columns df: date and sales
pd.DataFrame({'date': pd.date_range(start='1/5/2022', freq='D', periods=15),
'sales': [6, 8, 9, 5, 4, 8, 8, 3, 5, 9, 8, 3, 4, 7, 7]})
# we create a new column to get the name of the day of the week
.assign(name_of_day = lambda df: pd.Categorical(df.date.dt.day_name(),
categories=list(day_name),
ordered=True)
)
)
Output:
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.