Issue
I'm working on a popular Titanic dataset on Kaggle, and I would like to create a bar chart showing the numbers of survivors vs. deceased by gender. On the x-axis, I want gender (male/female). I want to have the survivors and deceased stacked and color coded.
Here is my current code, which produces four bars for each combination of male/survived, male/deceased, female/survived, female/deceased:
import pandas as pd
import seaborn as sns # for the data
df = sns.load_dataset('titanic').loc[:, ['sex', 'survived']]
df.groupby('sex').survived.value_counts().plot(kind='bar', color=['C0', 'C1'], stacked=True)
Current output
Solution
- The simplest way is to reshape the DataFrame with
pandas.DataFrame.pivot_table
, and then plot withpandas.DataFrame.plot
specifyingkind='bar'
andstacked=True
.- The important thing to remember is to shape the data into the correct format for the plot API.
- Use
.pivot_table
if values need to be aggregated, otherwise use.pivot
.
- Using
pandas v1.2.4
andmatplotlib v3.3.4
(matplotlib
is imported bypandas
as a dependency).
import seaborn as sns # used for the titanic data
import pandas as pd
# load the two necessary column
df = sns.load_dataset('titanic').loc[:, ['sex', 'survived']]
# create a pivot table
dfp = df.pivot_table(index='sex', columns=['survived'], aggfunc=len)
# display(dfp)
survived 0 1
sex
female 81 233
male 468 109
# plot the dataframe
dfp.plot(kind='bar', stacked=True, ylabel='Counts', xlabel='Gender',
title='Survival Status Count by Gender', rot=0)
- I do not recommend stacked bars because it is more difficult to differentiate and compare the values for each category.
dfp.plot(kind='bar', stacked=False, ylabel='Counts', xlabel='Gender',
title='Survival Status Count by Gender', rot=0)
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.