Issue
I am using scikit-learn to implement classification using Logistic Regression.
The class labels are predicted using predict()
function, while the predicted probabilities are printed using predict_proba()
function.
The code snippet is pasted below:
# Partition the dataset into train and test data
X_train, X_test, y_train, y_test = train_test_split(ds_X, ds_y, test_size=0.33, random_state=42)
y_pred = logreg.predict(X_test) # Predicted class labels from test features
y_predicted_proba = logreg.predict_proba(X_test) # Predicted probabilities from test features
The predicted labels are printed as
array([1, 1, 1, 1, 1, 1, 1, 1, 0, 1.......... and so on
The corresponding predicted probabilities are printed as
array([[ 0.03667012, 0.96332988],
[ 0.03638475, 0.96361525],
[ 0.03809274, 0.96190726],
[ 0.01746768, 0.98253232],
[ 0.02742639, 0.97257361],
[ 0.03676579, 0.96323421],
[ 0.02881874, 0.97118126],
[ 0.03082288, 0.96917712],
[ 0.65332179, 0.34667821],
[ 0.02091977, 0.97908023],
.
'
and so on
Observe,
the first predicted label - 1
the first predicted probability - [ 0.03667012, 0.96332988]
Why is 0.03667012 printed first, instead of 0.96332988 ?
Should it have been the other way?
Solution
The column 0 is the probability for class 0,
and the column 1 is the probability for the class 1.
If you have n classes the output probabilities shape will be (n_examples, n_classes).
Answered By - A. Attia
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.