Friday, April 8, 2022

[FIXED] Predicting values using Polynomial/Least Squares Regression

April 08, 2022 least-squares, non-linear-regression, polynomials, regression, scikit-learn No comments

Issue

I have a dataset of 2 variables (called x with shape n x 2 values of x1 and x2) and 1 output (called y). I am having trouble understanding how to calculate predicted output values from the polynomial features as well as weights. My understanding is that y = X dot w, where X are the polynomial features and w are the weights. The polynomial features were generated using PolynomialFeatures from sklearn.preprocessing. The weights were generated from np.linalg.lstsq. Below is a sample code that I created for this.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

df = pd.DataFrame()
df['x1'] = [1,2,3,4,5]
df['x2'] = [11,12,13,14,15]
df['y'] = [75,96,136,170,211]

x = np.array([df['x1'],df['x2']]).T
y = np.array(df['y']).reshape(-1,1)

poly = PolynomialFeatures(interaction_only=False, include_bias=True)
poly_features = poly.fit_transform(x)
print(poly_features)
w = np.linalg.lstsq(x,y)
weight_list = []
for item in w:
  if type(item) is np.int32:
    weight_list.append(item)
    continue
  for weight in item:
    if type(weight) is np.ndarray:
      weight_list.append(weight[0])
      continue
    weight_list.append(weight)
weight_list

y_pred = np.dot(poly_features, weight_list)
print(y_pred)

regression_model = LinearRegression()
regression_model.fit(x,y)
y_predicted = regression_model.predict(x)
print(y_predicted)

With the y_pred values, they are nowhere near the list of values that I created. Am I using the incorrect inputs for np.linalg.lstsq, is there a lapse in my understanding?

Using the built-in LinearRegression() function, the y_predicted is much closer to my provided y-values. The y_pred is orders of magnitude much higher.

Solution

In the lstsq function, the polynomial features that were generated should be the first input, not the x-data that is initially supplied.

Additionally, the first returned output of lstsq are the regression coefficients/weights, which can be accessed by indexing 0.

The corrected code using this explicit linear algebra method of least-squares regression weights/coefficients would be:

w = np.linalg.lstsq(poly_features,y, rcond=None)
y_pred = np.dot(poly_features, w[0])

For the entire correct code (note that this method is actually more accurate for predicted values than the default LinearRegression function):

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
df = pd.DataFrame()
df['x1'] = [1,2,3,4,5]
df['x2'] = [11,12,13,14,15]
df['y'] = [75,96,136,170,211]

x = np.array([df['x1'],df['x2']]).T
y = np.array(df['y']).reshape(-1,1)

poly = PolynomialFeatures(interaction_only=False, include_bias=True)
poly_features = poly.fit_transform(x)
print(poly_features)
w = np.linalg.lstsq(poly_features,y, rcond=None)
print(w)

y_pred = np.dot(poly_features, w[0])
print(y_pred)

regression_model = LinearRegression()
regression_model.fit(x,y)
y_predicted = regression_model.predict(x)
print(y_predicted)

Answered By - Sesshomaru

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, April 8, 2022

[FIXED] Predicting values using Polynomial/Least Squares Regression

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels