Issue
I have a matrix of size 500 X 28000, which contains a lot of zeros in between. But let us consider a working example with the matrix A:
A = [[0, 0, 0, 1, 0],
[1, 0, 0, 2, 3],
[5, 3, 0, 0, 0],
[5, 0, 1, 0, 3],
[6, 0, 0, 9, 0]]
I would like to plot a heatmap of the above matrix, but since it contains a lot of zeros, the heatmap contains almost white space as seen in the figure below.
How can I ignore the zeros in the matrix and plot the heatmap?
Here is the minimal working example that I tried:
im = plt.matshow(A, cmap=pl.cm.hot, norm=LogNorm(vmin=0.01, vmax=64), aspect='auto') # pl is pylab imported a pl
plt.colorbar(im)
plt.show()
which produces:
as you can see it is because of the zeros the white spaces appear.
But my original matrix of size 500X280000 contains a lot of zeros, which makes my colormap almost white!!
Solution
This answer is in the same direction as 'Edit 2' section of Luis' answer. In fact, this is a simplified version of it. I am posting this just in order to correct my misleading statements in my comments. I saw a warning that we should not discuss in the comment area, so I am using this answering area.
Anyway, first let me post my code. Please note that I used a larger matrix randomly generated inside the script, instead of your sample matrix A
.
#!/usr/bin/python
#
# This script was written by norio 2016-8-5.
import os, re, sys, random
import numpy as np
#from matplotlib.patches import Ellipse
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.image as img
mpl.rcParams['lines.linewidth'] = 2
mpl.rcParams['lines.markeredgewidth'] = 1.0
mpl.rcParams['axes.formatter.limits'] = (-4,4)
#mpl.rcParams['axes.formatter.limits'] = (-2,2)
mpl.rcParams['axes.labelsize'] = 'large'
mpl.rcParams['xtick.labelsize'] = 'large'
mpl.rcParams['ytick.labelsize'] = 'large'
mpl.rcParams['xtick.direction'] = 'out'
mpl.rcParams['ytick.direction'] = 'out'
############################################
#numrow=500
#numcol=280000
numrow=50
numcol=28000
# .. for testing
numelm=numrow*numcol
eps=1.0e-9
#
#numnz=int(1.0e-7*numelm)
numnz=int(1.0e-5*numelm)
# .. for testing
vmin=1.0e-6
vmax=1.0
outfigname='stackoverflow38790536.png'
############################################
### data matrix
# I am generating a data matrix here artificially.
print 'generating pseudo-data..'
random.seed('20160805')
matA=np.zeros((numrow, numcol))
for je in range(numnz):
jr = random.uniform(0,numrow)
jc = random.uniform(0,numcol)
matA[jr,jc] = random.uniform(vmin,vmax)
### Actual processing for a given data will start from here
print 'processing..'
idxrow=[]
idxcol=[]
val=[]
for ii in range(numrow):
for jj in range(numcol):
if np.abs(matA[ii,jj])>eps:
idxrow.append(ii)
idxcol.append(jj)
val.append( np.abs(matA[ii,jj]) )
print 'len(idxrow)=', len(idxrow)
print 'len(idxcol)=', len(idxcol)
print 'len(val)=', len(val)
############################################
# canvas setting for line plots
############################################
f_size = (8,5)
a1_left = 0.15
a1_bottom = 0.15
a1_width = 0.65
a1_height = 0.80
#
hspace=0.02
#
ac_left = a1_left+a1_width+hspace
ac_bottom = a1_bottom
ac_width = 0.03
ac_height = a1_height
############################################
# plot
############################################
print 'plotting..'
fig1=plt.figure(figsize=f_size)
ax1 =plt.axes([a1_left, a1_bottom, a1_width, a1_height], axisbg='w')
pc1=plt.scatter(idxcol, idxrow, s=20, c=val, cmap=mpl.cm.gist_heat_r)
# cf.
# http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter
plt.xlabel('Column Index', fontsize=18)
plt.ylabel('Row Index', fontsize=18)
ax1.set_xlim([0, numcol-1])
ax1.set_ylim([0, numrow-1])
axc =plt.axes([ac_left, ac_bottom, ac_width, ac_height], axisbg='w')
mpl.colorbar.Colorbar(axc,pc1, ticks=np.arange(0.0, 1.5, 0.1) )
plt.savefig(outfigname)
plt.close()
This script output a figure, 'stackoverflow38790536.png', which will look like the following.
As you can see in my code, I used scatter
instead of plot
. I realized that the plot
command is not best suitable for the task here.
Another of my words that I need to correct is that the row_index
does not need to have as much as 140,000,000(=500*280000) elements. It only need to have the row indices of the non-zero elements. More correctly, the lists,
idxrow
, idxcol
, and val
, which enter into scatter
command in the code above, has the lengths equal to the number of non-zero elements.
Please note that both of these points have been correctly taken care of in Luis' answer.
Answered By - norio
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.