Issue
I need to make a stacked barplot using this dataset(head):
data = {'model': ['A1', 'A6', 'A1', 'A4', 'A3'],
'year': [2017, 2016, 2016, 2017, 2019],
'price': [12500, 16500, 11000, 16800, 17300],
'transmission': ['Manual', 'Automatic', 'Manual', 'Automatic', 'Manual'],
'mileage': [15735, 36203, 29946, 25952, 1998],
'fuelType': ['Petrol', 'Diesel', 'Petrol', 'Diesel', 'Petrol'],
'tax': [150, 20, 30, 145, 145],
'mpg': [55.4, 64.2, 55.4, 67.3, 49.6],
'engineSize': [1.4, 2.0, 1.4, 2.0, 1.0]}
df = pd.DataFrame(data)
model year price transmission mileage fuelType tax mpg engineSize
0 A1 2017 12500 Manual 15735 Petrol 150 55.4 1.4
1 A6 2016 16500 Automatic 36203 Diesel 20 64.2 2.0
2 A1 2016 11000 Manual 29946 Petrol 30 55.4 1.4
3 A4 2017 16800 Automatic 25952 Diesel 145 67.3 2.0
4 A3 2019 17300 Manual 1998 Petrol 145 49.6 1.0
I would like the years (1997-2021) on x-axis and numbers ranging from 0 to 100 on the y-axis representing percentages. Finally, I would like the three different fuelTypes to be shown in yearly proportions; Petrol, Diesel and Hybrid.
I've already done the following calculations to get my percentages, per fuelType, per year and now I need to put it on a graph:
fuel_percentage = round((my_data_frame.groupby(['year'])['fuelType'].value_counts()/my_data_frame.groupby('year')['fuelType'].count())*100, 2)
print(fuel_percentage)
Which gives me the following result:
year fuelType
1997 Petrol 100.00
1998 Petrol 100.00
2002 Petrol 100.00
2003 Diesel 66.67
Petrol 33.33
2004 Petrol 80.00
Diesel 20.00
2005 Petrol 71.43
Diesel 28.57
2006 Petrol 66.67
Diesel 33.33
2007 Petrol 56.25
Diesel 43.75
2008 Diesel 66.67
Petrol 33.33
etc...
My main worry is that since the object is not a dataframe I won't be able to use it to make a plot.
Here is an example of the kind of plot I would like (replace players with fuelTypes and y-axis with percentages):
Thanks for the help!
Solution
- Tested in
python 3.8.11
,pandas 1.3.3
,matplotlib 3.4.3
.groupby
& .unstack
pandas.DataFrame.groupby
creates a long dataframe that must be unstacked to a wide form, to easily work with the plotting API
import pandas as pd
# I'm not a fan of this option because it requires doing .groupby twice
# calculate percent with groupby
dfc = (df.groupby(['year'])['fuelType'].value_counts() / df.groupby('year')['fuelType'].count()).mul(100).round(1)
# unstack the long dataframe
dfc = dfc.unstack(level=1)
.groupby
with.value_counts
and.unstack
dfc = df.groupby(['year'])['fuelType'].value_counts(normalize=True).mul(100).round(1).unstack(level=1)
.crosstab
- Alternatively, use
pandas.crosstab
to create a wide dataframe directly
# get the normalized value counts by index
dfc = pd.crosstab(df.year, df.fuelType, normalize='index').mul(100).round(1)
Plot
- Plot the dataframe with
pandas.DataFrame.plot
withkind='bar'
andstacked=True
, or withkind='area'
.
# display(dfc)
fuelType Diesel Petrol
year
2016 50.0 50.0
2017 50.0 50.0
2019 0.0 100.0
# plot bar
ax = dfc.plot(kind='bar', ylabel='Percent(%)', stacked=True, rot=0, figsize=(10, 4))
- Remove
xticks=dfc.index
to have the plotting API have more values on the x-axis.
# plot area
ax = dfc.plot(kind='area', ylabel='Percent(%)', rot=0, figsize=(10, 4), xticks=dfc.index)
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.