Issue
I need to calculate the linear regression and the MSE in groups of two variables of my dataframe. The problem is that I can't compare the xtrain with two variables with the ytrain with one, but I just have a column in my ytrain.
Code:
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01)
Problem:
from itertools import combinations
for c in combinations(range(4), 2):
lr=LinearRegression()
lr.fit(Xtrain[:,c].reshape(-1,1),ytrain)
yp=lr.predict(Xtest[:,c].reshape(-1,1))
print('MSE', np.sum((ytest - yp)**2) / len(ytest))
Error:
Solution
There is no need to use the reshape method on the feature matrices as they are already two dimensional. If you remove the reshaping your code will work, see below.
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from itertools import combinations
import numpy as np
X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
for c in combinations(range(4), 2):
lr = LinearRegression()
lr.fit(X_train[:, c], y_train)
yp = lr.predict(X_test[:, c])
print('MSE', np.sum((y_test - yp) ** 2) / len(y_test))
# MSE 591.707619290734
# MSE 33.613143724590564
# MSE 634.3248475857874
# MSE 1646.9447686107499
# MSE 2293.2878076807942
# MSE 1700.2559702871085
Answered By - Flavia Giammarino
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.