Issue
I have the following pandas data frame and would like to create n
plots horizontally where n = unique labels(l1,l2,.) in the a1 row
(for example in the following example there will be two plots because of l1 and l2
). Then for these two plots, each plot will plot a4
as the x-axis against a3
as y axis. For example, ax[0]
will contain a graph for a1
, where it has three lines, linking the points [(1,15)(2,20)],[(1,17)(2,19)],[(1,23)(2,15)]
for the below data.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
d = {'a1': ['l1','l1','l1','l1','l1','l1','l2','l2','l2','l2','l2','l2'],
'a2': ['a', 'a', 'b','b','c','c','d','d','e','e','f','f'],
'a3': [15,20,17,19,23,15,22,21,23,23,24,27],
'a4': [1,2,1,2,1,2,1,2,1,2,1,2]}
df=pd.DataFrame(d)
df
a1 a2 a3 a4
1 a 15 1
1 a 20 2
1 b 17 1
1 b 19 2
1 c 23 1
1 c 15 2
2 d 22 1
2 d 21 2
2 e 23 1
2 e 23 2
2 f 24 1
2 f 27 2
I currently have the following:
def graph(dataframe):
x = dataframe["a4"]
y = dataframe["a3"]
ax[0].plot(x,y) #how do I plot and set the title for each group in their respective subplot without the use of for-loop?
fig, ax = plt.subplots(1,len(pd.unique(df["a1"])),sharey='row',figsize=(15,2))
df.groupby(["a1"]).apply(graph)
However, my above attempt only plots all a3 against a4 on the first subplot(because I wrote ax[0].plot()
). I can always use a for-loop to accomplish the desired task, but for large number of unique groups in a1
, it will be computationally expensive. Is there a way to make it a one-liner on the line ax[0].plot(x,y)
and it accomplishes the desired task without a for loop? Any inputs are appreciated.
Solution
I do not see any way of avoiding a for loop when plotting this data with pandas. My initial thought was to reshape the dataframe to make subplots=True
work, like this:
dfp = df.pivot(columns='a1').swaplevel(axis=1).sort_index(axis=1)
dfp
But I do not see how to select the level 1 of the the columns MultiIndex
to make something like dfp.plot(x='a4', y='a3', subplots=True)
work.
Removing level 0 and then running the plotting function with
dfp.droplevel(axis=1, level=0).plot(x='a4', y='a3', subplots=True)
raises ValueError: x must be a label or position
. And even if this worked, there would still be the issue of linking the correct points together.
The seaborn package was created to conveniently plot this kind of dataset. If you are open to using it here is an example with relplot
:
import pandas as pd # v 1.1.3
import seaborn as sns # v 0.11.0
d = {'a1': ['l1','l1','l1','l1','l1','l1','l2','l2','l2','l2','l2','l2'],
'a2': ['a', 'a', 'b','b','c','c','d','d','e','e','f','f'],
'a3': [15,20,17,19,23,15,22,21,23,23,24,27],
'a4': [1,2,1,2,1,2,1,2,1,2,1,2]}
df = pd.DataFrame(d)
sns.relplot(data=df, x='a4', y='a3', col='a1', hue ='a2', kind='line', height=4)
You can customize the colors with the palette
argument and adjust the grid layout with col_wrap
.
Answered By - Patrick FitzGerald
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.