Monday, December 13, 2021

[FIXED] How to make a stacked barplot for percentage of three classes per year?

December 13, 2021 dataframe, matplotlib, pandas, python, stacked-chart No comments

Issue

I need to make a stacked barplot using this dataset(head):

data = {'model': ['A1', 'A6', 'A1', 'A4', 'A3'],
        'year': [2017, 2016, 2016, 2017, 2019],
        'price': [12500, 16500, 11000, 16800, 17300],
        'transmission': ['Manual', 'Automatic', 'Manual', 'Automatic', 'Manual'],
        'mileage': [15735, 36203, 29946, 25952, 1998],
        'fuelType': ['Petrol', 'Diesel', 'Petrol', 'Diesel', 'Petrol'],
        'tax': [150, 20, 30, 145, 145],
        'mpg': [55.4, 64.2, 55.4, 67.3, 49.6],
        'engineSize': [1.4, 2.0, 1.4, 2.0, 1.0]}

df = pd.DataFrame(data)

  model  year  price transmission  mileage fuelType  tax   mpg  engineSize
0    A1  2017  12500       Manual    15735   Petrol  150  55.4         1.4
1    A6  2016  16500    Automatic    36203   Diesel   20  64.2         2.0
2    A1  2016  11000       Manual    29946   Petrol   30  55.4         1.4
3    A4  2017  16800    Automatic    25952   Diesel  145  67.3         2.0
4    A3  2019  17300       Manual     1998   Petrol  145  49.6         1.0

I would like the years (1997-2021) on x-axis and numbers ranging from 0 to 100 on the y-axis representing percentages. Finally, I would like the three different fuelTypes to be shown in yearly proportions; Petrol, Diesel and Hybrid.

I've already done the following calculations to get my percentages, per fuelType, per year and now I need to put it on a graph:

fuel_percentage = round((my_data_frame.groupby(['year'])['fuelType'].value_counts()/my_data_frame.groupby('year')['fuelType'].count())*100, 2)

print(fuel_percentage)

Which gives me the following result:

year  fuelType
1997  Petrol      100.00
1998  Petrol      100.00
2002  Petrol      100.00
2003  Diesel       66.67
      Petrol       33.33
2004  Petrol       80.00
      Diesel       20.00
2005  Petrol       71.43
      Diesel       28.57
2006  Petrol       66.67
      Diesel       33.33
2007  Petrol       56.25
      Diesel       43.75
2008  Diesel       66.67
      Petrol       33.33
etc...

My main worry is that since the object is not a dataframe I won't be able to use it to make a plot.

Here is an example of the kind of plot I would like (replace players with fuelTypes and y-axis with percentages):

Thanks for the help!

... edit ...

Solution

Tested in python 3.8.11, pandas 1.3.3, matplotlib 3.4.3

`.groupby` & `.unstack`

pandas.DataFrame.groupby creates a long dataframe that must be unstacked to a wide form, to easily work with the plotting API

import pandas as pd

# I'm not a fan of this option because it requires doing .groupby twice
# calculate percent with groupby
dfc = (df.groupby(['year'])['fuelType'].value_counts() / df.groupby('year')['fuelType'].count()).mul(100).round(1)

# unstack the long dataframe
dfc = dfc.unstack(level=1)

.groupby with .value_counts and .unstack

dfc = df.groupby(['year'])['fuelType'].value_counts(normalize=True).mul(100).round(1).unstack(level=1)

`.crosstab`

Alternatively, use pandas.crosstab to create a wide dataframe directly

# get the normalized value counts by index
dfc = pd.crosstab(df.year, df.fuelType, normalize='index').mul(100).round(1)

Plot

Plot the dataframe with pandas.DataFrame.plot with kind='bar' and stacked=True, or with kind='area'.

# display(dfc)
fuelType  Diesel  Petrol
year                    
2016        50.0    50.0
2017        50.0    50.0
2019         0.0   100.0

# plot bar
ax = dfc.plot(kind='bar', ylabel='Percent(%)', stacked=True, rot=0, figsize=(10, 4))

Remove xticks=dfc.index to have the plotting API have more values on the x-axis.

# plot area
ax = dfc.plot(kind='area', ylabel='Percent(%)', rot=0, figsize=(10, 4), xticks=dfc.index)

Answered By - Trenton McKinney

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, December 13, 2021

[FIXED] How to make a stacked barplot for percentage of three classes per year?

Issue

Solution

`.groupby` & `.unstack`

`.crosstab`

Plot

0 comments:

Post a Comment

Popular Posts

Labels

Monday, December 13, 2021

Issue

Solution

.groupby & .unstack

.crosstab

Plot

0 comments:

Post a Comment

Popular Posts

Labels

`.groupby` & `.unstack`

`.crosstab`