Issue
Say I have a dataframe structured like so:
Name x y
Joe 0,1,5 0,3,8
Sue 0,2,8 1,9,5
...
Harold 0,5,6 0,7,2
I'd like to plot the values in the x and y axis on a line plot based on row. In reality, there are many x and y values, but there is always one x value for every y value in these columns. The name of the plot would be the value in "name".
I've tried to do this by first converting x and y to lists in their own separate columns like so:
df['xval'] = df.['x'].str.split(',')
df['yval'] = df.['y'].str.split(',')
And then passing them to seaborn:
ax = sns.lineplot(x=df['xval'], y=df['yval'], data=df)
However, this does not work because 1) I recieve an error, which I presume is due to attempting to pass a list from a dataframe, claiming:
TypeError: unhashable type: 'list'
And 2) I cannot specify the value for df['name'] for the specific line plot. What's the best way to go about solving this problem?
Solution
Data and imports:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
df = pd.DataFrame({
'name': ['joe', 'sue', 'mike'],
'x': ['0,1,5', '0,2,8', '0,4'],
'y': ['0,3,8', '1,9,5', '1,6']
})
We should convert df
into a useable format for plotting. This makes all plotting eaiser. We can take advantage of the fact that x
and y
have a 1-to-1 relationship. Notice I've added a third name with a 2 xy value as opposed to 3 to show this method will work for varied amounts of x and y per name as long as each row has equal numbers of x and y values.
Creating the plot_df
:
# Grab Name Column to Start Plot DF with
plot_df = df.loc[:, ['name']]
# Split X column
plot_df['x'] = df['x'].str.split(',')
# Explode X into Rows
plot_df = plot_df.explode('x').reset_index(drop=True)
# Split and Series Explode y in one step
# This works IF AND ONLY IF a 1-to-1 relationship for x and y
plot_df['y'] = df['y'].str.split(',').explode().reset_index(drop=True)
# These need to be numeric to plot correctly
plot_df.loc[:, ['x', 'y']] = plot_df.loc[:, ['x', 'y']].astype(int)
plot_df
:
name x y
0 joe 0 0
1 joe 1 3
2 joe 5 8
3 sue 0 1
4 sue 2 9
5 sue 8 5
6 mike 0 1
7 mike 4 6
References to the methods used in creating plot_df
:
DataFrame.loc
to subset the dataframeSeries.str.split
to split the comma separated values into a listDataFrame.explode
to upscale the DataFrame based on the iterable inx
DataFrame.reset_index
to make index unique again after explodingSeries.explode
to upscale the lists in the Seriesy
.Series.reset_index
to make index unique again after explodingDataFrame.astype
since the values are initially strings just splitting and exploding is not enough. Will need to convert to a numeric type for them to plot correctly
Plotting (Option 1)
# Plot with hue set to name.
sns.lineplot(data=plot_df, x='x', y='y', hue='name')
plt.show()
References for plotting separate lines:
sns.lineplot
to plot. Note thehue
argument to create separate lines based onname
.pyplot.show
to display.
Plotting (Option 2.a) Subplots:
sns.relplot(data=plot_df, x='x', y='y', col='name', kind='line')
plt.tight_layout()
plt.show()
Plotting (Option 2.b) Subplots:
# Use Grouper From plot_df
grouper = plot_df.groupby('name')
# Create Subplots based on the number of groups (ngroups)
fig, axes = plt.subplots(nrows=grouper.ngroups)
# Iterate over axes and groups
for ax, (grp_name, grp) in zip(axes, grouper):
# Plot from each grp DataFrame on ax from axes
sns.lineplot(data=grp, x='x', y='y', ax=ax, label=grp_name)
plt.show()
References for plotting subplots:
2.a
relplot
therow
orcol
parameter can be used to create subplots in a similar way to howhue
creates multiple lines. This will return aseaborn.FacetGrid
so post processing will be different thanlineplot
which returns matplotlib.axes.Axes
2.b
groupby
to create iterable that can be used to plot subplots.pyplot.subplots
to create subplots to plot on.groupby.ngroup
to count number of groups.zip
to iterate over axes and groups simultaneously.sns.lineplot
to plot. Notelabel
is needed to have legends.grp_name
contains the current key that is common in the currentgrp
DataFrame.pyplot.show
to display.
Plotting option 3 (separate plots):
# Plot from each grp DataFrame in it's own plot
for grp_name, grp in plot_df.groupby('name'):
fig, ax = plt.subplots()
sns.lineplot(data=grp, x='x', y='y', ax=ax)
ax.set_title(grp_name)
fig.show()
References for plotting separate plots:
groupby
to create iterable that can be used to plot each name separately.pyplot.subplots
to create separate plot to plot on.sns.lineplot
to plot. Notelabel
is needed to have legends.grp_name
contains the current key that is common in the currentgrp
DataFrame.pyplot.show
to display.
Answered By - Henry Ecker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.