Issue
I fit LogisticRegression model on train data checked the score on test and get
test_score 0.802083
afterward ,Out of curiosity, I fit the model on tets and checked the score on test and somehow get the same test score.
why?
I am using diabitis data
https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database?select=diabetes.csv
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
diab_cols = ['Pregnancies', 'Insulin', 'BMI','Glucose','BloodPressure','DiabetesPedigreeFunction']
X = df[diab_cols]# Features
y = df.Outcome # Target variable
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25,
random_state=0)
model = LogisticRegression().fit(X_train,y_train)
model_test = LogisticRegression().fit( X_test, y_test)
print("test_score",model.score(X_test,y_test))
print("test_score",model_test.score(X_test,y_test))
Solution
Looks like your test data is a "perfect" representation of training set.
One possibility is the model created on both training and test set having similar weights. You can verify the weights of the two LogisticRegression
models.
e.g.,
print(model.coef_[0])
print(model_test.coef_[0])
If the weights of the models are different, second possibility is that the classification points always lie on the same side of the threshold (default 0.5
) and therefore are classified same in both models, keeping the score same. You can check the confidence level of the model for all the test classification by calling the decision_function()
method.
e.g.,
model.decision_function(x_test)
model_test.decision_function(x_test)
Answered By - the_ordinary_guy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.