Issue
I’m trying to create a line plot using seaborn and I’m struggling to define the “y”. ;)
I’m following the process set out here: https://seaborn.pydata.org/generated/seaborn.lineplot.html
Where I’m failing is in creating the plot with mean and shaded 95% CI because I can’t define ‘y’.
The example has taken its y (“passengers”) from a previous shape of the same Dataframe, where this was the column header (and then the data has been reformatted with month and year as the columns/index).
My data is already in a Dataframe with the required structure (columns are dates and rows are the outputs of N simulations). I want to plot the mean and CI of the simulation outputs over time.
So I feel like this should be really easy, but I can’t find any info about how to label the values! (I guess I could reshape the data into a single column and give it a label but that seems very inefficient!)
All values in the df should have the same label (‘Approvals’) similar to how ‘passengers’ works in the link.
Thank you!!
Solution
You might need to convert your dataframe to "long form". This helps most Seaborn functions toreach their full potential.
Here is some example code with data that initially is organized as one column per date.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some test data as suggested by the original post
df = pd.DataFrame({}, index=[f'sim_{i}' for i in range(1, 11)])
dates = pd.date_range('20211201', periods=20, freq='D')
for d in dates:
df[d] = np.random.normal(.1, 1, len(df)).cumsum()
df.index.name = 'simulation' # give the index an explicit name, this will be the column name after df.reset_index()
# convert the dataframe to long form
df_long = df.reset_index().melt(id_vars='simulation', var_name='date', value_name='value')
df_long['date'] = pd.to_datetime(df_long['date']) # make the column a real datetime column
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(12, 4), sharex=True)
sns.lineplot(data=df_long, x='date', y='value', ci=95, ax=ax1)
sns.lineplot(data=df_long, x='date', y='value', hue='simulation', ax=ax2)
ax2.legend_.remove()
plt.tight_layout()
plt.show()
PS: This is how the original dataframe df
looks like:
2021-12-01 2021-12-02 ... 2021-12-19 2021-12-20
sim_1 -0.173437 0.488611 ... 0.304839 -0.324995
sim_2 -0.283472 2.692735 ... -0.526787 -0.451747
...
And the "long form":
simulation date value
0 sim_1 2021-12-01 -0.173437
1 sim_2 2021-12-01 -0.283472
2 sim_3 2021-12-01 -0.657405
...
Answered By - JohanC
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.