Issue
I have a dataset that changes mean and variance after time, but still is in the same range of y values. I want visualize the change of the data. Here is the plot with the features I'm looking for:
Unfortunately I don't know how to create such a beauty. I tried the seaborn.catplot()
function but I'm missing the option to visualize the mean and the connections of the individual datapoints.
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
dist_1 = np.random.beta(2.,2.,250)
dist_2 = np.random.beta(3.,4.,250)
data = {"dist 1": dist_1, "dist 2": dist_2}
df = pd.DataFrame(data=data)
sns.catplot(data=df, s=3.)
plt.show()
The result would be something like this:
Any ideas to add the missing features?
Solution
To get a categorical plot you need to create a data frame that contains a categorical column. I made slight modifications to your code with respect to the column names - import numpy as np import seaborn as sns import pandas as pd import matplotlib.pyplot as plt
dist_1 = np.random.beta(2.,2.,250)
dist_2 = np.random.beta(3.,4.,250)
data = {"dist1": dist_1, "dist2": dist_2}
df = pd.DataFrame(data=data)
df['id'] = df.index
I added a column for the index of each row so that I can use it to convert this wide dataframe into a long one.
long_df = pd.wide_to_long(df, 'dist', 'id', 'cat').reset_index()
long_df.head()
# id cat dist
#0 0 1 0.558567
#1 1 1 0.265868
#2 2 1 0.076470
#3 3 1 0.891855
#4 4 1 0.215208
Now I want to use the cat column on the x-axis of the catplot, dist on the y-axis and create a point plot for each row in the original dataframe.
sns.catplot(x="cat", y="dist", hue='id', kind="point", data=long_df, legend=False)
plt.show()
This gives you the following plot -
Bonus: You can also do the same using a line plot and add the markers for mean.
mean_df = long_df.groupby('cat')['dist'].mean().reset_index()
mean_df
# cat dist
#0 1 0.535771
#1 2 0.399939
fig = plt.figure()
ax1 = fig.add_subplot(111)
sns.lineplot(x='cat', y='dist', hue='id', data=long_df, legend=False, ax=ax1)
ax1.plot(mean_df['cat'], mean_df['dist'], 'x')
plt.show()
Which gives you -
The 'X' on each category represents the mean. Hope that helps!
Answered By - Ic3fr0g
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.