Issue
I am trying to create labels for the median, outliers and quartiles of a 1-dimensional boxplot that has only x-coordinate values. I'de like to label the query, url, and CTR for the median, quartiles and outliers. Here is what the data frame looks like:
URL | Clicks | CTR | Query |
---|---|---|---|
website.com/1 | 20 | 0.06 | query1 |
website.com/2 | 4 | 0.10 | query2 |
My code for the above plot:
df_ = df[df.Clicks > 4 ]
sns.boxplot(x=df_['CTR'])
plt.xlabel("CTR")
plt.show()
What I have so far are the values and outlier limit:
median = df_['CTR'].median()
ctr_q1 = df_.quantile(0.25)['CTR']
ctr_q3 = df_.quantile(0.75)['CTR']
outlier_lim = ctr_q3 + 1.5 * (ctr_q3 - ctr_q1)
My problem is that while trying to add text, I'm not sure what to put into plt.text()
without having a y value to locate in the following code:
for i in df_["CTR"]:
if i > outlier_lim:
plt.text(x = i, y=????? s = "here")
If I try putting an arbitrary value like 0 or 1 for y, I get something like this:
>>> for i in df_["CTR"]:
... if i > outlier_lim:
... plt.text(x = i, y = 0, s = "here")
...
Text(0.6923076923076923, 0, 'here')
Text(0.47619047619047616, 0, 'here')
Text(0.5333333333333333, 0, 'here')
Text(0.4583333333333333, 0, 'here')
Text(0.5, 0, 'here')
Text(0.5, 0, 'here')
Text(0.5, 0, 'here')
Text(0.5384615384615384, 0, 'here')
Text(0.5833333333333334, 0, 'here')
Text(0.5, 0, 'here')
Text(0.5, 0, 'here')
Text(0.55, 0, 'here')
Text(0.6153846153846154, 0, 'here')
>>> plt.xlabel("CTR")
Text(0.5, 0, 'CTR')
>>> plt.show()
Most of the related posts I've seen use either seaborn or matplotlib functions that require a y parameter. Does anyone have a solution for when y doesn't exist?
Thanks!
Solution
The y-position of the central line is at y=0. The box goes from y=-0.4
to y=0.4
, but note that the y-axis is reversed (negative values at the top). The y-values do exist, but are hidden automatically in order not to distract.
Here is some example code (note that seaborn automatically sets the xlabel to the name of the column):
from matplotlib import pyplot as plt
from matplotlib.ticker import MultipleLocator, ScalarFormatter
import seaborn as sns
import numpy as np
import pandas as pd
np.random.seed(2021)
df_ = pd.DataFrame({'CTR': np.random.geometric(0.5, size=80) / 100})
ax = sns.boxplot(x=df_['CTR'])
# show the ytick positions, as a reference
ax.yaxis.set_major_locator(MultipleLocator(0.1))
ax.yaxis.set_major_formatter(ScalarFormatter())
median = df_['CTR'].median()
ctr_q1 = df_.quantile(0.25)['CTR']
ctr_q3 = df_.quantile(0.75)['CTR']
outlier_lim = ctr_q3 + 1.5 * (ctr_q3 - ctr_q1)
for i in df_["CTR"]:
if i > outlier_lim:
ax.text(x=i, y=0.01, s="here", ha='center', va='top')
plt.show()
Answered By - JohanC
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.