Issue
I'm learning Employee Turnover Forecast and I got the result as below from predict_proba If I see the first row, I may interpret as this employee would left the company by 83%. Do I understand this correctly?
Output exceeds the size limit. Open the full output data in a text editor
array([[0.17, 0.83],
[0.43, 0.57],
[0.29, 0.71],
[0.94, 0.06],
[0.98, 0.02],
[0.84, 0.16],
[0.64, 0.36],
[1. , 0. ],
[0.85, 0.15],
[0.99, 0.01],
[0.09, 0.91],
[0.89, 0.11],
[0.21, 0.79],
[0.15, 0.85],
[0.78, 0.22],
[0.18, 0.82],
[0.84, 0.16],
[0.45, 0.55],
[0.96, 0.04],
[0.95, 0.05],
[0.91, 0.09],
[0.9 , 0.1 ],
[1. , 0. ],
[0.91, 0.09],
[0.74, 0.26],
...
[0.94, 0.06],
[0.99, 0.01],
[0.22, 0.78],
[0.89, 0.11],
[0.98, 0.02]])
Solution
Model score is a measure of the model certainty of the outcome. However, it's not necessarily the same as probability: it does not mean 83% people with 0.83 score leaving yet. Logistic regression scores are probabilities by design, but for random forest behaviour is implementation defined. If you seek to integrate your scores into business metrics directly, you'll need to calibrate your model first (using e.g. sklearn.calibration.CalibratedClassifierCV
or isotonic regression).
Answered By - dx2-66
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.