Issue
I was using this code to plot all data in my df:
m_cols = ['is_canceled','lead_time', 'arrival_date_year','arrival_date_week_number','arrival_date_day_of_month','stays_in_weekend_nights','adults','children','babies','is_repeated_guest','previous_cancellations','previous_bookings_not_canceled','booking_changes','agent','total_of_special_requests']
for col in num_cols:
sns.boxplot(y=df['is_canceled'].astype('category'),x=col,data=df)
plt.show()
But I got a few plots that look like this, how can I fix it?
Solution
The boxplots seem to show that the large majority of values is zero, and the rest are shown as outliers. So e.g. previous_annulations is usually zero, a few have some specif value. All outliers with the same value are drawn on top of each other. Note that the "box" of a boxplot
goes between the 25th and the 75th percentile, with a division at the median.
An idea could be to use a different type of plot, e.g. a violinplot
using the titanic dataset:
import matplotlib.pyplot as plt
import seaborn as sns
df = sns.load_dataset('titanic')
m_cols = df.select_dtypes('number').columns.to_list()[1:]
fig, axs = plt.subplots(nrows=len(m_cols), ncols=2, figsize=(15, 7))
for col, ax_row in zip(m_cols, axs):
sns.boxplot(y=df['survived'].astype('category'), x=col, data=df, ax=ax_row[0], palette='rocket')
sns.violinplot(y=df['survived'].astype('category'), x=col, data=df, ax=ax_row[1], palette='rocket')
sns.despine()
plt.tight_layout()
plt.show()
Answered By - JohanC
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.