Thursday, January 18, 2024

[FIXED] Scatter plots with Matplotlib in which points are colored based on distance among them

January 18, 2024 matplotlib, python, scatter-plot No comments

Issue

I’d like to plot with matplotlib several scatter graphs in which the points automatically are colored based on distance among them. The idea is that the routine was able to calculate the average distance from a point and all other, perform this calculation for all points and compare the average distance between the nth point and others. The point with minimum average distance should be colored with the max bright color in a color scale set by user (colormap, like for example the Reds sequential colormaps: https://matplotlib.org/stable/users/explain/colors/colormaps.html) while the point with maximum average distance with the max shaded color of the color scale. Is there already a function that perform what I asked and can be used quickly in matplotlib, or is it necessary to write a specific function that does the calculation? Can you also suggest a different calculation approach, maybe more efficient?

The code from I want to start is the following:

import matplotlib.pyplot as plt
import numpy as np

x = np.array([1,3,4,6,7,9,10,11,13,14,15])
y = np.array([1,3,4,6,7,9,10,11,13,14,15])
plt.scatter(x, y)

x = np.array([6,7,8,9,10,11,12,13,14,15])
y = np.array([1,4,7,10,13,16,19,22,25,28])
plt.scatter(x, y)

plt.show()

Well, the two series of points (blue and orange) should be colored in a specific color of the colormap choosen by user, but really I can't understand how to assign to each point a color of colormap based on reciprocal points distance, in order to see in bright color the points that are spatially more dense/closer and in shaded color the points that are spatially less dense/closer.

Details after @user3128 answer:

What I mean in the comment is the following. If I have the code:

import matplotlib.pyplot as plt
import numpy as np
 
x = np.array([1,3,4,6,7,9,10,11,13,14,15])
y = np.array([1,3,4,6,7,9,10,11,13,14,15])
xy_set1 = np.column_stack([x, y])
plt.scatter(x, y, s=100, marker='s', color='darkslategray', label='set1')

x = np.linspace(1.0,15.0,num=10000)
y = 2.5*x-8
xy_set2 = np.column_stack([x, y])
plt.scatter(x, y, s=100, marker='o', color='darkslategray', label='set2')

#Calculate distances
# To install sklearn: pip install scikit-learn
from sklearn.metrics import pairwise_distances
distances_matrix = pairwise_distances(xy_set1, xy_set2)
average_distances_set1 = distances_matrix.mean(axis=1)
average_distances_set2 = distances_matrix.mean(axis=0)

#Plot points and colour by distance (use a reversed colour map)
s1 = plt.scatter(xy_set1[:, 0], xy_set1[:, 1], c=average_distances_set1, s=20, marker='s', cmap='Reds_r')
plt.colorbar(label='distance | set 1')

s2 = plt.scatter(xy_set2[:, 0], xy_set2[:, 1], c=average_distances_set2, s=20, marker='o', cmap='Reds_r')
plt.colorbar(label='distance | set 2')

plt.legend()
plt.gcf().set_size_inches(5, 3)
plt.xlabel('x')
plt.ylabel('y')
plt.show()

I obtain the following plot, in which my goal is to see the very dense set of points (set2) with very bright color respect to the set1:

In fact, if the function in line 10 is changed to 'y=1.5*x+4' , for example, the points in set1 near x=12 are too dark, they should be light compared to the points in set2 that are very dense, signifying that only along the portion of the graph that runs along set2 the relationship between X and Y follows a trend more visible and "strong" (dark color in colormap) and thus maybe with a greater frequency of occurrence.

This is my goal: suppose I’ve a lot of points that represents a relationship between parameter X and parameter Y of a particular natural/physical phenomenon. Where I have some very dense 2D areas with a lot of points, the color should be bright (or dark in the chosen colormap), while where area is sparse (few points or with great distance among them) the color should be shaded (or light in the chosen colormap). In other words: where I’ve a lot of points with little distance among them it means that the physical phenomenon that I’m representing via plot (parameter Y respect to parameter X) could follow a particular math law, that is well visible by bright colored points that are distributed in a specific 2D shape in the scatter plot (for example they could follow a specific power law), while poor representative points, that are less frequent and could represent a strange and not strong relationship between Y and X in that 2D portion, should be colored with shaded color. Basically I would like to use color scales to visualize on a 2D graph a particular trend (if any) between two parameters of a certain natural phenomenon.

Solution

Computing the mean distance is trivial, no need for external libraries but Numpy.

CAVEAT if you want to operate on millions of points, as you state in comments, your distance matrices will have a no. of entries > 10¹² and you'll need a computer with a few Tera bytes of memory — or you'll devise a slower but less memory hungry algorithm.

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(20231228) ;  npt = 300
x, y = np.random.rand(npt), np.random.rand(npt)

# mean distance ####################################################
m_d = np.sum(np.sqrt((x-x[:,None])**2+(y-y[:,None])**2), axis=0)/npt
####################################################################

plt.scatter(x, y, c=m_d, cmap='Reds_r', ec='k', lw=0.3)
plt.colorbar()
plt.show()

Just to show how it works

In [24]: x = np.arange(6)

In [25]: x-x[:,None]
Out[25]: 
array([[ 0,  1,  2,  3,  4,  5],
       [-1,  0,  1,  2,  3,  4],
       [-2, -1,  0,  1,  2,  3],
       [-3, -2, -1,  0,  1,  2],
       [-4, -3, -2, -1,  0,  1],
       [-5, -4, -3, -2, -1,  0]])

In [26]: (x-x[:,None])**2
Out[26]: 
array([[ 0,  1,  4,  9, 16, 25],
       [ 1,  0,  1,  4,  9, 16],
       [ 4,  1,  0,  1,  4,  9],
       [ 9,  4,  1,  0,  1,  4],
       [16,  9,  4,  1,  0,  1],
       [25, 16,  9,  4,  1,  0]])

In [27]:

Answered By - gboffi

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 18, 2024

[FIXED] Scatter plots with Matplotlib in which points are colored based on distance among them

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels