Issue
I was solving the problem of boston house price with linear regression using sklearn. An error like this occurred along the way:
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 13 is different from 1)
Code:
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
X = boston.data
y = boston.data
dfX = pd.DataFrame(X, columns = boston.feature_names)
dfy = pd.DataFrame(y, columns = ["Price"] )
df = pd.concat([dfX,dfy],axis =1)
reg = LinearRegression()
reg.fit(X,y)
x_12 = np.array(dfX["LSTAT"]).reshape(-1,1) # 12th data in boston.data
y = np.array(dfy["Price"]).reshape(-1,1)
predict = reg.predict(x_12) > Error code
Solution
The error seems to be due to LinearRegression
's function fit
is used on all 13 features of load_boston
dataset but when using predict
, you only use 1 feature (LSTAT
). This seems to cause conflict between the trained model and predict
input data. You probably need to update your fit
function so that it only takes in LSTAT
feature so that it will only expect one feature as input data when using predict
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
boston = load_boston()
X, y = load_boston(return_X_y=True)
# X will now have only data from "LSTAT" column
X = X[:, np.newaxis, boston.feature_names.tolist().index("LSTAT")]
dfX = pd.DataFrame(X, columns = ["LSTAT"] )
dfy = pd.DataFrame(y, columns = ["Price"] )
df = pd.concat([dfX,dfy],axis =1)
reg = LinearRegression()
reg.fit(X, y)
x_12 = np.array(dfX["LSTAT"]).reshape(-1, 1) # 12th data in boston.data
y = np.array(dfy["Price"]).reshape(-1, 1)
predict = reg.predict(x_12)
Answered By - VietHTran
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.