Issue
My goal is to create a color map that is based on month. I have two data sets of monthly data (same length, etc). However I want to plot a scatter plot of the two data sets, but the colormap to be colored based on the month. Hopefully this makes more sense as I walk through the example:
These are the two data sets I am plotting against each other as a scatter plot:
data1 = np.random.rand(360)
data2 = np.random.rand(360)
I then use this function (split_months) to turn data1 and data2 into a 2-dimensional array of size 12, 30. This is just like regrouping by month, where 12 represents all the months, and 30 is all the years of that particular month:
def split_months(monthly_data):
month_split = []
for month in range(12):
month_split.append(monthly_data[month::12])
month_split = np.array(month_split)
return month_split
split_data1 = split_months(data1)
split_data2 = split_months(data2)
print(split_data1.shape, split_data2.shape)
(12, 30) (12, 30)
I then reshape the split month data into a 1D array by basically having the first month and all the years, preceded by the second month and all the years. So making a 1-dimensional array but reordering it by each month and thus number of years (as seen by example below):
split_months_reshape_data1= split_data1.reshape(12*30) ## reshaping so organized by month now (jan - dec for all years)
split_months_reshape_data2 = split_data2.reshape(12*30)
print(split_data1[0])
print(split_months_reshape_data1[:30])
[0.70049451 0.24326443 0.29633189 0.35540148 0.68205274 0.15130453
0.34046832 0.54975106 0.4502673 0.39086571 0.5610824 0.88443547
0.85777702 0.39887896 0.82240821 0.31162978 0.23496537 0.68776803
0.84677736 0.04060598 0.7735167 0.23317739 0.49447141 0.53932027
0.62494628 0.19676697 0.41435389 0.22843223 0.22817976 0.09133836]
[0.70049451 0.24326443 0.29633189 0.35540148 0.68205274 0.15130453
0.34046832 0.54975106 0.4502673 0.39086571 0.5610824 0.88443547
0.85777702 0.39887896 0.82240821 0.31162978 0.23496537 0.68776803
0.84677736 0.04060598 0.7735167 0.23317739 0.49447141 0.53932027
0.62494628 0.19676697 0.41435389 0.22843223 0.22817976 0.09133836]
## data arrays are the same, split_months is showing all of the numbers for the first month, while split_months_reshape_data1 is showing the first 30 values which is the same as the `split_months[0]`
Now the question is, is there a way to use each of the 12 arrays in split_months, to create a colormap (January - December) but using those specific values in each array?
For example, for January, using the values from split_months[0]
to make one color for the colormap. Then for February, use the values from split_months[1]
to make another color for the colormap
This is the idea I was going for, but the colorbar is not correct:
plt.scatter(split_months_reshape_data1,split_months_reshape_data2, c = split_data1)
plt.colorbar()
plt.show()
plt.show()
Please let me know if my question needs clarification, it is a bit specific, however the main goal is to obtain a colormap based on the reshaped data array (split_data1
and split_data2
).
Solution
Selecting colors from a colormap is quite simple, as shown in the matplotlib colormap tutorial. There are two types of colormap objects (LinearSegmentedColormap and ListedColormap) and they do not have exactly the same methods to select the colors. Here is how to select colors from the viridis colormap (ListedColormap), using the pyplot interface:
# Select colormap with a certain number of colors
cmap = plt.cm.get_cmap('viridis', 12)
# Generate list of colors in these 3 equivalent ways for a ListeColormap
colors = cmap.colors # this method is not applicable to LinearSegmentedColormaps
colors = cmap(range(12))
colors = cmap(np.linspace(0, 1, 12))
Creating the colorbar is the trickier part. The dataset you are plotting consists of 3 variables:
- month (categorical): plotted as hue
- data1 (numerical): plotted as the x variable
- data2 (numerical): plotted as the y variable
As you have seen in your example, the variable passed to c
(i.e. split_data1
, the x variable) is mapped to the colorbar created with plt.colorbar()
. While it is possible to pass values corresponding to the months to c
to create the colorbar (see alternative solution shown below after figure), I find the code easier to understand if instead the colors for the months are preselected and then passed to color
. The colorbar can be then created separately from the plot, as shown in the second example of the Customized Colorbars Tutorial.
Here is an example where the data reshaping part is simplified by using several numpy functions and where the scatter plot is created using zip to loop through the sub-arrays and the related months and colors. The names of the months are generated with the datetime module to save a bit of typing.
from datetime import datetime as dt
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.4
# Create sample dataset
rng = np.random.default_rng(seed=1) # random number generator
data1 = rng.random(360)
data2 = rng.random(360)
# Reshape data
split_data1 = np.stack(np.split(data1, 30)).transpose()
split_data2 = np.stack(np.split(data2, 30)).transpose()
# Generate lists of months and colors
months = [dt.strptime(str(m), '%m').strftime('%B') for m in range(1, 13)]
cmap = plt.cm.get_cmap('viridis') # no need to preselect number of colors in this case
colors = cmap(np.linspace(0, 1, len(months)))
# Draw scatter plot by looping over zipped sub-arrays, colors and months
for x, y, c, month in zip(split_data1, split_data2, colors, months):
plt.scatter(x, y, color=c, label=month)
# Add colorbar
bounds = np.arange(len(months)+1)
norm = plt.matplotlib.colors.BoundaryNorm(bounds, cmap.N)
cbar = plt.colorbar(plt.cm.ScalarMappable(norm=norm, cmap=cmap), ticks=bounds+0.5)
cbar.set_ticklabels(months)
# Optional extra formatting
cbar.ax.tick_params(length=0, pad=7)
cbar.ax.invert_yaxis()
plt.show()
For the sake of completeness, here is an alternative solution that uses the c
argument in plt.scatter
(instead of color
) to generate the colorbar directly from the plot:
# Prepare data...
# months and cmap are the same as before
months = [dt.strptime(str(m), '%m').strftime('%B') for m in range(1, 13)]
cmap = plt.cm.get_cmap('viridis')
# Create objects needed to map the months to colors and create a colorbar
bounds = np.arange(13)
norm = plt.matplotlib.colors.BoundaryNorm(bounds, cmap.N)
# Draw scatter plot, notice how there is no need for colors
for x, y, month, bound in zip(split_data1, split_data2, months, bounds):
plt.scatter(x, y, c=np.repeat(bound, len(x)), norm=norm, cmap=cmap, label=month)
cbar = plt.colorbar()
# Format colorbar
cbar.set_ticklabels(months)
cbar.set_ticks(bounds+0.5)
cbar.ax.tick_params(length=0, pad=7)
cbar.ax.invert_yaxis()
Answered By - Patrick FitzGerald
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.