Issue
I'm looking to see how to do two things in Seaborn with using a bar chart to display values that are in the dataframe, but not in the graph
- I'm looking to display the values of one field in a dataframe while graphing another. For example, below, I'm graphing 'tip', but I would like to place the value of
'total_bill'
centered above each of the bars (i.e.325.88 above Friday, 1778.40 above Saturday, etc.) - Is there a way to scale the colors of the bars, with the lowest value of
'total_bill'
having the lightest color (in this case Friday) and the highest value of'total_bill'
having the darkest. Obviously, I'd stick with one color (i.e. blue) when I do the scaling.
While I see that others think that this is a duplicate of another problem (or two), I am missing the part of how I use a value that is not in the graph as the basis for the label or the shading. How do I say, use total_bill as the basis. I'm sorry, but I just can't figure it out based on those answers.
Starting with the following code,
import pandas as pd
import seaborn as sns
%matplotlib inline
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch08/tips.csv", sep=',')
groupedvalues=df.groupby('day').sum().reset_index()
g=sns.barplot(x='day',y='tip',data=groupedvalues)
I get the following result:
Interim Solution:
for index, row in groupedvalues.iterrows():
g.text(row.name,row.tip, round(row.total_bill,2), color='black', ha="center")
On the shading, using the example below, I tried the following:
import pandas as pd
import seaborn as sns
%matplotlib inline
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch08/tips.csv", sep=',')
groupedvalues=df.groupby('day').sum().reset_index()
pal = sns.color_palette("Greens_d", len(data))
rank = groupedvalues.argsort().argsort()
g=sns.barplot(x='day',y='tip',data=groupedvalues)
for index, row in groupedvalues.iterrows():
g.text(row.name,row.tip, round(row.total_bill,2), color='black', ha="center")
But that gave me the following error:
AttributeError: 'DataFrame' object has no attribute 'argsort'
So I tried a modification:
import pandas as pd
import seaborn as sns
%matplotlib inline
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch08/tips.csv", sep=',')
groupedvalues=df.groupby('day').sum().reset_index()
pal = sns.color_palette("Greens_d", len(data))
rank=groupedvalues['total_bill'].rank(ascending=True)
g=sns.barplot(x='day',y='tip',data=groupedvalues,palette=np.array(pal[::-1])[rank])
and that leaves me with
IndexError: index 4 is out of bounds for axis 0 with size 4
Solution
Let's stick to the solution from Changing color scale in seaborn bar plot. You want to use argsort
to determine the order of the colors to use for colorizing the bars. In the linked question argsort
is applied to a Series
object, which works fine, while here you have a DataFrame. So you need to select one column of that DataFrame to apply argsort
on.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df = sns.load_dataset("tips")
groupedvalues=df.groupby('day').sum().reset_index()
pal = sns.color_palette("Greens_d", len(groupedvalues))
rank = groupedvalues["total_bill"].argsort().argsort()
g=sns.barplot(x='day',y='tip',data=groupedvalues, palette=np.array(pal[::-1])[rank])
for index, row in groupedvalues.iterrows():
g.text(row.name,row.tip, round(row.total_bill,2), color='black', ha="center")
plt.show()
The second attempt works fine as well, the only issue is that the rank as returned by rank()
starts at 1
instead of zero. So one has to subtract 1 from the array. Also for indexing we need integer values, so we need to cast it to int
.
rank = groupedvalues['total_bill'].rank(ascending=True).values
rank = (rank-1).astype(np.int)
- From
matplotlib 3.4.0
, there is.bar_label
, which has alabel
parameter for custom labels.- Other answers using
.bar_label
didn't customize the labels withlabels=
. - See this answer from May 16, 2021, for a thorough explanation of
.bar_label
with links to documentation and examples.
- Other answers using
- The
day
column downloads as acategory Dtype
, which keeps the days of the week in order. This also ensures the plot order of the bars on the x-axis and the values intb
..bar_label
adds labels from left to right, so the values intb
are in the same order as the bars.- If working with a column that isn't categorical,
pd.Categorical
can be used on the column to set the order.
- In
sns.barplot
,estimator=sum
is specified to sumtip
. The default ismean
.
df = sns.load_dataset("tips")
# sum total_bill by day
tb = df.groupby('day').total_bill.sum()
# get the colors in blues as requested
pal = sns.color_palette("Blues_r", len(tb))
# rank the total_bill sums
rank = tb.argsort()
# plot
fig, ax = plt.subplots(figsize=(8, 6))
sns.barplot(x='day', y='tip', data=df, palette=np.array(pal[::-1])[rank], estimator=sum, ci=False, ax=ax)
# 1. add labels using bar_label with custom labels from tb
ax.bar_label(ax.containers[0], labels=tb, padding=3)
# pad the spacing between the number and the edge of the figure
ax.margins(y=0.1)
plt.show()
Answered By - ImportanceOfBeingErnest
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.