Issue
I have this dataset:
mydf = pd.DataFrame({'Feature':['Pysch','Physio'],'log_or':[0.3126,0.2022],
'se':[0.0712,0.0568], 'conf_low':[0.1729,0.0907], 'conf_high':[0.4522, 0.3136]})
mydf = mydf.sort_values(by='log_or')
mydf
Feature log_or se conf_low conf_high
1 Physio 0.2022 0.0568 0.0907 0.3136
0 Pysch 0.3126 0.0712 0.1729 0.4522
And I want to create an error bar plot using my calculated confidence intervals in con_low
and conf_high
I tried this at the beginning but I can see that the intervals don't cover my calculated confidence intervals:
plt.errorbar(mydf['log_or'], mydf['Feature'],
xerr=mydf['se'], marker='s', mfc='Tomato')
plt.show()
You can see that, for example, in the Physio variable the error bar goes from 0.14 to 0.26 in the image approximately, but my tabulated confidence intervals go from 0.091 to 0.316.
So I tried to set up my custom intervals, with this:
lowr = mydf['conf_low'].to_numpy()
uppr = mydf['conf_high'].to_numpy()
intervals = [lowr, uppr]
plt.errorbar(mydf['log_or'], mydf['Feature'], xerr=intervals, marker='s', mfc='Tomato')
plt.show()
Now my variable Physio interval goes from 0.1 to 0.5 approx, which is wrong. Now, what I am doing wrong? How can I use my custom intervals to this plot?
Solution
I think you are misunderstanding what the values passed to xerr
are meant to represent. Have a look at the plt.errorbar
documentation (sub xerr, yerr
).
From your first attempt: xerr=mydf['se']
will be used as follows:
shape(N,): Symmetric +/-values for each data point.
From your second attempt, xerr=intervals
will be used as follows:
shape(2, N): Separate - and + values for each bar. First row contains the lower errors, the second row contains the upper errors.
So, the values you are passing here are used to measure the length of the error (+/- for each data point). However, your values in mydf.conf_low
and mydf.conf_high
do not represent length, they are simply x-values. As you mention for Physio:
my tabulated confidence intervals go from 0.091 to 0.316.
The solution then is to calculate the length on both sides and pass those values to xerr
. Like so:
import pandas as pd
import matplotlib.pyplot as plt
mydf = pd.DataFrame({'Feature':['Pysch','Physio'],'log_or':[0.3126,0.2022],
'se':[0.0712,0.0568], 'conf_low':[0.1729,0.0907], 'conf_high':[0.4522, 0.3136]})
mydf = mydf.sort_values(by='log_or')
mydf
plt.errorbar(mydf['log_or'], mydf['Feature'],
xerr=((mydf.log_or - mydf.conf_low),(mydf.conf_high-mydf.log_or)), marker='s', mfc='Tomato')
plt.show()
Result:
Answered By - ouroboros1
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.