Issue
I am plotting two different box plots with pandas with this:
plt.figure()
df['mean_train_score_error'] = [1] - df['mean_train_score']
df.boxplot(column=['mean_train_score_error'], by='modelo',
medianprops = medianprops,
autorange=True,showfliers=False, patch_artist=True,
vert=True, showmeans=True,meanline=True)
plt.ylabel('Error: 1-F1 Score')
plt.title('Error de entrenamiento')
plt.suptitle('')
df['mean_test_score_error'] = [1] - df['mean_test_score']
df.boxplot(column=['mean_test_score_error'], by='modelo',
medianprops = medianprops,
autorange=True,showfliers=False, patch_artist=True,
vert=True, showmeans=True,meanline=True)
plt.ylabel('Error: 1-F1 Score')
plt.title('Error de validaciĆ³n')
plt.suptitle('')
And I am getting the following two plots:
The question is if is possible plot the 6 boxplot on the same plot and to use different color for the each three boxplot of the each plot?
Solution
- The easiest way to do this is transform the data from a wide to long format, and then plot with seaborn, using the
hue
parameter. - pandas.wide_to_long
- There must be a unique id, hence adding the
id
column. - The columns being transformed, must have similar
stubnames
, which is why I movederror
to the front of the column name.- The error column names will be in one column and the value in a separate column
- There must be a unique id, hence adding the
Imports and Test Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# setup data and dataframe
np.random.seed(365)
data = {'mod_lg': np.random.normal(0.3, .1, size=(30,)),
'mod_rf': np.random.normal(0.05, .01, size=(30,)),
'mod_bg': np.random.normal(0.02, 0.002, size=(30,)),
'mean_train_score': np.random.normal(0.95, 0.3, size=(30,)),
'mean_test_score': np.random.normal(0.86, 0.5, size=(30,))}
df = pd.DataFrame(data)
df['error_mean_test_score'] = [1] - df['mean_test_score']
df['error_mean_train_score'] = [1] - df['mean_train_score']
df["id"] = df.index
df = pd.wide_to_long(df, stubnames='mod', i='id', j='mode', sep='_', suffix='\D+').reset_index()
df["id"] = df.index
# display dataframe: this is probably what your dataframe looks like to generate your current plots
id mode mean_train_score error_mean_test_score mean_test_score error_mean_train_score mod
0 0 lg 0.663855 -0.343961 1.343961 0.336145 0.316792
1 1 lg 0.990114 0.472847 0.527153 0.009886 0.352351
2 2 lg 1.179775 0.324748 0.675252 -0.179775 0.381738
3 3 lg 0.693155 0.519526 0.480474 0.306845 0.470385
4 4 lg 1.191048 -0.128033 1.128033 -0.191048 0.085305
Transform and plot
- The
error_score_name
column contains the suffix fromerror_mean_test_score
&error_mean_train_score
- The
error_score_value
column contains the values.
# convert df error columns to long format
dfl = pd.wide_to_long(df, stubnames='error', i='id', j='score', sep='_', suffix='\D+').reset_index(level=1)
dfl.rename(columns={'score': 'error_score_name', 'error': 'error_score_value'}, inplace=True)
# display dfl
error_score_name mean_train_score mod mean_test_score mode error_score_value
id
0 mean_test_score 0.663855 0.316792 1.343961 lg -0.343961
1 mean_test_score 0.990114 0.352351 0.527153 lg 0.472847
2 mean_test_score 1.179775 0.381738 0.675252 lg 0.324748
3 mean_test_score 0.693155 0.470385 0.480474 lg 0.519526
4 mean_test_score 1.191048 0.085305 1.128033 lg -0.128033
# plot dfl
sns.boxplot(x='mode', y='error_score_value', data=dfl, hue='error_score_name')
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.