Issue
I calculated NaN value percentage of a dataframe and then plotted it. I want each variable to have a unique color. The code I used works well but every 9th variable color is same as 1st variable color, and the cycle repeats. See the pic:
The code:
per = df.isna().mean().round(4) * 100
f, ax = plt.subplots(figsize=(25, 12), dpi = 200)
i = 0
for key, value in zip(per.keys(), per.values):
if (value > 0):
ax.bar(key, value, label=key)
ax.text(i, value + 0.5, str(np.round(value, 2)), ha='center')
i = i + 1
ax.set_xticklabels([])
ax.set_xticks([])
plt.title('NaN Value percentage in the dataset')
plt.ylim(0,115)
plt.ylabel('Percentage')
plt.xlabel('Columns')
plt.legend(loc='upper left')
plt.show()
I tried the following line of code, but it picked only first color:
my_colors = list(islice(cycle(['b', 'r', 'g', 'y', 'c', 'm',
'tan', 'grey', 'pink', 'chocolate', 'gold']), None, len(df)))
f, ax = plt.subplots(figsize=(25, 12), dpi = 200)
i = 0
for key, value in zip(per.keys(), per.values):
if (value > 0):
ax.bar(key, value, label=key, color = my_colors)
ax.text(i, value + 0.5, str(np.round(value, 2)), ha='center')
i = i + 1
ax.set_xticklabels([])
ax.set_xticks([])
plt.title('NaN Value percentage in the dataset')
plt.ylim(0,115)
plt.ylabel('Percentage')
plt.xlabel('Columns')
plt.legend(loc='upper left')
plt.show()
Any help is appreciated.
See the data here.
Solution
I think there are two problems with your second code:
my_colors = list(islice(cycle(['b', 'r', 'g', 'y', 'c', 'm',
'tan', 'grey', 'pink', 'chocolate', 'gold']), None, len(df)))
Here len(df)
gets you the number of rows, but you actually want a list that is equal to the number of per.keys(). So: len(per.keys())
. Next, you need to use your variable i
to iterate over your list of colors.
ax.bar(key, value, label=key, color = my_colors)
Here, I think you need to use my_colors[i]
.
Incidentally, using matplotlib.cm.get_cmap on matplotlib's Colormaps is great to get you a list of unique colors from a palette quickly. Try something like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import random
import string
# build df with some random NaNs
data = np.random.uniform(low=0, high=10, size=(5,20))
mask = np.random.choice([1, 0], data.shape, p=[.4, .6]).astype(bool)
data[mask] = np.nan
df = pd.DataFrame(data, columns=list(string.ascii_lowercase)[:20])
per = df.isna().mean().round(4) * 100
length = len(per.keys())
cmap = cm.get_cmap('plasma', length)
lst = [*range(length)]
random.shuffle(lst)
f, ax = plt.subplots(figsize=(25, 12), dpi = 200)
i = 0
for key, value in zip(per.keys(), per.values):
if (value > 0):
ax.bar(key, value, label=key, color = cmap(lst[i])[:3])
ax.text(i, value + 0.5, str(np.round(value, 2)), ha='center')
i = i + 1
ax.set_xticklabels([])
ax.set_xticks([])
plt.title('NaN Value percentage in the dataset')
plt.ylim(0,115)
plt.ylabel('Percentage')
plt.xlabel('Columns')
plt.legend(loc='upper left')
plt.show()
Output:
Or non-random (comment out random.shuffle(lst)
):
Answered By - ouroboros1
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.