Issue
I'm having a difficult time trying to create a bar plot with and DataFrame
grouped by year and month. With the following code I'm trying to plot the data in the created image, instead of that, is returning a second image. Also I tried to move the legend to the right and change its values to the corresponding month.
I started to get a feel for the DataFrames obtained with the groupby
command, though not getting what I expected led me to ask you guys.
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
df = pd.read_csv('fcc-forum-pageviews.csv', index_col='date')
line_plot = df.value[(df.value > df.value.quantile(0.025)) & (df.value < df.value.quantile(0.975))]
fig, ax = plt.subplots(figsize=(10,10))
bar_plot = line_plot.groupby([line_plot.index.year, line_plot.index.month]).mean().unstack()
bar_plot.plot(kind='bar')
ax.set_xlabel('Years')
ax.set_ylabel('Average Page Views')
plt.show()
This is the format of the data that I am analyzing.
date,value
2016-05-09,1201
2016-05-10,2329
2016-05-11,1716
2016-05-12,10539
2016-05-13,6933
Solution
- Add a sorted categorical
'month'
column withpd.Categorical
- Transform the dataframe to a wide format with
pd.pivot_table
whereaggfunc='mean'
is the default.- Wide format is typically best for plotting grouped bars.
pandas.DataFrame.plot
returnsmatplotlib.axes.Axes
, so there's no need to usefig, ax = plt.subplots(figsize=(10,10))
.- The pandas
.dt
accessor is used to extract various components of'date'
, which must be adatetime dtype
- If
'date'
is not adatetime dtype
, then transform it withdf.date = pd.to_datetime(df.date)
.
- If
- Tested with
python 3.8.11
,pandas 1.3.1
, andmatplotlib 3.4.2
Imports and Test Data
import pandas as pd
from calendar import month_name # conveniently supplies a list of sorted month names or you can type them out manually
import numpy as np # for test data
# test data and dataframe
np.random.seed(365)
rows = 365 * 3
data = {'date': pd.bdate_range('2021-01-01', freq='D', periods=rows), 'value': np.random.randint(100, 1001, size=(rows))}
df = pd.DataFrame(data)
# select data within specified quantiles
df = df[df.value.gt(df.value.quantile(0.025)) & df.value.lt(df.value.quantile(0.975))]
# display(df.head())
date value
0 2021-01-01 694
1 2021-01-02 792
2 2021-01-03 901
3 2021-01-04 959
4 2021-01-05 528
Transform and Plot
- If
'date'
has been set to the index, as stated in the comments, use the following:df['months'] = pd.Categorical(df.index.strftime('%B'), categories=months, ordered=True)
# create the month column
months = month_name[1:]
df['months'] = pd.Categorical(df.date.dt.strftime('%B'), categories=months, ordered=True)
# pivot the dataframe into the correct shape
dfp = pd.pivot_table(data=df, index=df.date.dt.year, columns='months', values='value')
# display(dfp.head())
months January February March April May June July August September October November December
date
2021 637.9 595.7 569.8 508.3 589.4 557.7 508.2 545.7 560.3 526.2 577.1 546.8
2022 567.9 521.5 625.5 469.8 582.6 627.3 630.4 474.0 544.1 609.6 526.6 572.1
2023 521.1 548.5 484.0 528.2 473.3 547.7 525.3 522.4 424.7 561.3 513.9 602.3
# plot
ax = dfp.plot(kind='bar', figsize=(12, 4), ylabel='Mean Page Views', xlabel='Year', rot=0)
_ = ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.