Issue
I am trying to create distplot of a dataframe grouped by a column
data_plot = creditcard_df.copy()
amount = data_plot['Amount']
data_plot.drop(labels=['Amount'], axis=1, inplace = True)
data_plot.insert(0, 'Amount', amount)
# Plot the distributions of the features
columns = data_plot.iloc[:,0:30].columns
plt.figure(figsize=(12,30*4))
grids = gridspec.GridSpec(30, 1)
for grid, index in enumerate(data_plot[columns]):
ax = plt.subplot(grids[grid])
sns.distplot(data_plot[index][data_plot.Class == 1], hist=False, kde_kws={"shade": True}, bins=20)
sns.distplot(data_plot[index][data_plot.Class == 0], hist=False, kde_kws={"shade": True}, bins=20)
ax.set_xlabel("")
ax.set_title("Distribution of Column: " + str(index))
plt.show()
I tried to use a log scale for the y axis, change the gridspec, and the figsize; but all of those only made a mess of the distributions. Is there a way to make the plots uniform?
Solution
seaborn.distplot
is deprecated. Useseaborn.kdeplot
, which is an axes-level plot. Otherwise useseaborn.displot
for a figure-level plot.- Tested in
python 3.8.11
,pandas 1.3.2
,matplotlib 3.4.2
,seaborn 0.11.2
Imports and Test Data
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(365)
rows = 10000
data = {'a': np.random.normal(5, 5, rows),
'b': np.random.normal(20, 5, rows),
'c': np.random.normal(35, 5, rows),
'd': np.random.normal(500, 50, rows),
'e': np.random.normal(6500, 500, rows),
'class': np.random.choice([0, 1], size=(rows), p=[0.25, 0.75])}
df = pd.DataFrame(data)
# display(df.head(3))
a b c d e class
0 5.839606 20.807027 34.798230 509.328065 6003.228497 0
1 7.617526 21.691519 40.519995 445.724478 7204.039621 0
2 9.086878 27.193222 32.776264 498.254687 6810.960924 1
Plot with seaborn.kdeplot
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(15, 7), sharex=False, sharey=False)
axes = axes.ravel() # array to 1D
cols = df.columns[:-1] # create a list of dataframe columns to use
for col, ax in zip(cols, axes):
data = df[[col, 'class']] # select the data
sns.kdeplot(data=data, x=col, hue='class', shade=True, ax=ax)
ax.set(title=f'Distribution of Column: {col}', xlabel=None)
fig.delaxes(axes[5]) # delete the empty subplot
fig.tight_layout()
plt.show()
Plot with seaborn.displot
# convert the dataframe from wide to long
dfm = df.melt(id_vars='class', var_name='Distribution')
# display(dfm.head(3))
class Distribution value
0 0 a 5.839606
1 0 a 7.617526
2 1 a 9.086878
# plot
sns.displot(kind='kde', data=dfm, col='Distribution', col_wrap=3, x='value', hue='class', fill=True, facet_kws={'sharey': False, 'sharex': False})
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.