Saturday, December 23, 2023

[FIXED] subplots from a multiindex pandas dataframe grouped by level

December 23, 2023 matplotlib, multi-index, pandas, python, subplot No comments

Issue

How I do multiple plot from a multi-indexed pandas DataFrame based on one of the levels of the multiindex?

I have results from a model with different technologies usage in different scenarios, the results could look something like this:

import numpy as np
import pandas as pd
df=pd.DataFrame(abs(np.random.randn(12,4)),columns=[2011,2012,2013,2014])
df['scenario']=['s1','s1','s1','s2','s2','s3','s3','s3','s3','s4','s4','s4']
df['technology'=['t1','t2','t5','t2','t6','t1','t3','t4','t5','t1','t3','t4']
dfg=df.groupby(['scenario','technology']).sum().transpose()

dfg would have the technologies employed each year for each scenario. I would like to have a subplot for each scenario sharing the legend.

If I simply use the argument subplots=True, then it plots all the possible combinations (12 subplots)

dfg.plot(kind='bar',stacked=True,subplots=True)

Based on this response I got closer to what I was looking for.

f,a=plt.subplots(2,2)

fig1=dfg['s1'].plot(kind='bar',ax=a[0,0])

fig2=dfg['s2'].plot(kind='bar',ax=a[0,1])

fig2=dfg['s3'].plot(kind='bar',ax=a[1,0])

fig2=dfg['s3'].plot(kind='bar',ax=a[1,1])

plt.tight_layout()

but the result is not ideal, each subplot has a different legend...and that makes it quite difficult to read. There must be an easier way to do subplots from a multiindexed dataframes... Thanks!

EDIT1: Ted Petrou proposed a nice solution using seaborn factorplot but I have two issues. I already have a style defined and I'd rather not use the seaborn style (one solution could be change the parameters of seaborn). The other problem is that I wanted to use a stacked bar plot, which require considerable extra tweaks. Any chance I can do something similar with Matplotlib?

Solution

In my opinion it's easier to do a data analysis when you 'tidy' up your data - making each column represent one variable. Here, you have all 4 years represented in different columns. Pandas has one function and one method to make long(tidy) data from wide(messy) data. You can use df.stack or pd.melt(df) to tidy your data. Then you can take advantage of the excellent seaborn library which expects tidy data to easily plot most anything you want.

Tidy the data

df1 = pd.melt(df, id_vars=['scenario', 'technology'], var_name='year')
print(df1.head())

  scenario technology  year     value
0       s1         t1  2011  0.406830
1       s1         t2  2011  0.495418
2       s1         t5  2011  0.116925
3       s2         t2  2011  0.904891
4       s2         t6  2011  0.525101

Use Seaborn

import seaborn as sns
sns.factorplot(x='year', y='value', hue='technology', 
               col='scenario', data=df1, kind='bar', col_wrap=2,
              sharey=False)

Answered By - Ted Petrou

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, December 23, 2023

[FIXED] subplots from a multiindex pandas dataframe grouped by level

Issue

Solution

Tidy the data

Use Seaborn

0 comments:

Post a Comment

Popular Posts

Labels