Issue
In Have gradient colours in sns.pairplot for one column of dataframe so that I can see which datapoints are connected to each other very good answers were given how to solve the challenge to recognize which data points are related to the same data points in other sub plots.
To have a self containing question, I state here my requirement (which is somehow an extension of the linked question):
I would like to see the interdependence of my data.
For that I want to have a gradual color gradient for one column of my DataFrame (so
that low numerical values of that column are e.g. yellow and high values are blue).
For a second column of my data, I would like to have increasing marker sizes with
increasing values of this column.
These colors and marker sizes should be visible for all non diagonal subplots of my
plot, based on the data points of a and b.
The solution to the gradient color is given in the linked question. I put here both solutions that presently exist:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
f, axes = plt.subplots(1, 1)
np.random.seed(1)
a = np.arange(0, 10, 0.1)
def myFunc(x):
myReturn = +10 + 10*x -x**2 + 1*np.random.random(x.shape[0])
return myReturn
b = myFunc(a)
c = a * np.sin(a)
df = pd.DataFrame({'a': a, 'b': b, 'c': c})
if False:
sns.pairplot(
df,
corner=True,
diag_kws=dict(color=".6"),
plot_kws=dict(
hue=df.index,
palette="blend:gold,dodgerblue",
),
)
else:
from matplotlib.colors import LinearSegmentedColormap
cmap = LinearSegmentedColormap.from_list('blue-yellow', ['gold', 'lightblue', 'darkblue']) # plt.get_cmap('viridis_r')
g = sns.pairplot(df, corner=True)
for ax in g.axes.flat:
if ax is not None and not ax in g.diag_axes:
for collection in ax.collections:
collection.set_cmap(cmap)
collection.set_array(df['a'])
plt.show()
A (basic) solution for the increasing marker sizes would be (using simply matplotlib):
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
np.random.seed(19680801)
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2 # 0 to 15 point radii
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()
My question is:
I could work on a manual solution to iterate over all columns of my DataFrame and build the sub plots by myself. Is there any more convenient (and probably more robust) way to do this?
Solution
You can modify the sizes and hue for the off-diagonal data easily by adding the parameters you'd use in Matplotlib to the plot_kws
dictionary:
sns.pairplot(df, corner=True,
diag_kws=dict(color=".6"),
plot_kws=dict(
hue=df['a'],
palette="blend:gold,dodgerblue",
size = df['b']
)
)
Answered By - m13op22
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.