Issue
I'm trying to find the slope and y-intercept coefficients for a linear equation. I created a test domain and range to make sure the numbers I was receiving were correct. The equation should be y = 2x + 1, but the model is saying the slope is 24 and the y-intercept is 40.3125. The model accurately predicts every value I give it, but I'm questioning how I can get the proper values.
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X = np.arange(0, 40)
y = (2 * X) + 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=0)
X_train = [[i] for i in X_train]
X_test = [[i] for i in X_test]
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)
print('Coefficients: \n', regr.coef_)
print('Y-intercept: \n', regr.intercept_)
print('Mean squared error: %.2f'
% mean_squared_error(y_test, y_pred))
print('Coefficient of determination: %.2f'
% r2_score(y_test, y_pred))
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
print(X_test)
plt.xticks()
plt.yticks()
plt.show()
Solution
This is happening because you scaled your training and testing data. So even though you generated y
as a linear function of X
, you converted X_train
and X_test
onto another scale by standardizing it (subtract the mean and divide by the standard deviation).
If we run your code but omit the lines where you scale the data, you get the expected results.
X = np.arange(0, 40)
y = (2 * X) + 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=0)
X_train = [[i] for i in X_train]
X_test = [[i] for i in X_test]
# Skip the scaling of X_train and X_test
#sc = StandardScaler()
#X_train = sc.fit_transform(X_train)
#X_test = sc.transform(X_test)
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)
print('Coefficients: \n', regr.coef_)
> Coefficients:
[2.]
print('Y-intercept: \n', regr.intercept_)
> Y-intercept:
1.0
Answered By - Arturo Sbr
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.