Issue
Usually when I plot some distribution I like to insert auxiliar lines to show extra information, such as mean:
plt.figure(figsize=(15, 5))
h = r1['TAXA_ATUAL_UP'].mean()
plt.axvline(h, color='k', linestyle='dashed', linewidth=2)
print(h) # 692.6621026418171
plt.annotate('{0:.2f}'.format(h), xy=(h+100, 0.02), fontsize=12)
sns.distplot(r1['TAXA_ATUAL_UP'].dropna())
sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_UP'].dropna(), hist=False, label='Y = 1')
sns.distplot(r1[r1['REMOTO'] == 0]['TAXA_ATUAL_UP'].dropna(), hist=False, label='Y = 0')
Recently, using the same code to plot other data, I got a weird result. Basically, what I notice is that the h
value is big and the result is that the plot is reduced drastically:
plt.figure(figsize=(15, 5))
h = r1['TAXA_ATUAL_DOWN'].mean()
plt.axvline(h, color='k', linestyle='dashed', linewidth=2)
print(h) # 8777.987291627895
plt.annotate('{0:.2f}'.format(h), xy=(h, 0.02), fontsize=12)
sns.distplot(r1['TAXA_ATUAL_DOWN'].dropna())
sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_DOWN'].dropna(), hist=False, label='Y = 1')
I wonder what causes this I how I should get the annotation to work properly, or fix whaterver I'm doing wrong.
Solution
Try replacing
plt.annotate('{0:.2f}'.format(h), xy=(h, 0.02), fontsize=12)
with
plt.annotate('{0:.2f}'.format(h), xy=(h+100, 0.00012), fontsize=12)
I believe what is happening is that you are trying to annotate at the same xy
coordinates as in your old plot, but the axis scales are drastically different. So when you annotate at xy=(h,0.02)
, 0.02 is significantly above the maximum of your y axis, and your figure is being re-scaled accordingly.
Looking at your new plot, it looks like it would make more sense to put your text at somewhere like xy=(h+100, 0.00012)
, or somewhere thereabouts. If that works, you can fine-tune your location according to where you want it (or, more programmatically, put your y coordinate at something like 0.75 * maximum_y_value
, where maximum_y_value
is the highest point on your y axis).
A hacky but effective way to do this would be to use
y_max = max([h.get_height() for h in sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_DOWN'].dropna()).patches])
plt.annotate('{0:.2f}'.format(h), xy=(h, 0.75*y_max), fontsize=12)
What this actually does is get the values of the histogram that would be plotted by default in sns.distplot
(which you have disabled), and finds the max of that.
Answered By - sacuL
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.