Issue
I will create dummy data and train sklearn Logistic regression on it. Then I would like to get the output of predict_proba
but only with own coef_
and intercept_
calculation, but the results are different. The setting is following:
X = [[0,0,0], [0,1,0], [0,2,0], [1,1,1], [0,1,0], [0,2,0]]
y = [0,0,0,1,1,2]
# Fit the classifier
clf = linear_model.LogisticRegression(C=1e5, multi_class="ovr", class_weight="balanced")
clf.fit(X, y)
Then I will simply use the knowledge about sigmoid and softmax to get output:
softmax([
expit(np.dot([[0,2,0]], clf.coef_[0]) + clf.intercept_[0]),
expit(np.dot([[0,2,0]], clf.coef_[1]) + clf.intercept_[1]),
expit(np.dot([[0,2,0]], clf.coef_[2]) + clf.intercept_[2])
])
But it will return different values then
clf.predict_proba([[0,2,0]])
array([[0.281399 , 0.15997556, 0.55862544]])
in opposite to array([[0.29882052], [0.24931448], [0.451865 ]])
Solution
You can replicate the calculation of the predicted probabilities using the estimated parameters as follows:
from sklearn import linear_model
from scipy.special import expit, softmax
import numpy as np
# Data
X = [[0,0,0], [0,1,0], [0,2,0], [1,1,1], [0,1,0], [0,2,0]]
y = [0,0,0,1,1,2]
# Classifier
clf = linear_model.LogisticRegression(C=1e5, multi_class="ovr", class_weight="balanced")
clf.fit(X, y)
# Predicted probabilities
print(clf.predict_proba([[0,2,0]]))
#[[0.281399 0.15997556 0.55862544]]
# Recalculated predicted probabilities without softmax
prob1 = np.array([expit(np.dot([[0,2,0]], clf.coef_[0]) + clf.intercept_[0]),
expit(np.dot([[0,2,0]], clf.coef_[1]) + clf.intercept_[1]),
expit(np.dot([[0,2,0]], clf.coef_[2]) + clf.intercept_[2])]).reshape(1, -1)
print(prob1 / np.sum(prob1))
#[[0.281399 0.15997556 0.55862544]]
# Recalculated predicted probabilities with softmax
prob2 = np.log(prob1)
print(softmax(prob2))
#[[0.281399 0.15997556 0.55862544]]
Answered By - Flavia Giammarino
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.