Friday, August 19, 2022

[FIXED] What is sklearn.LinearRegression().coef_ returning and in what order are the coefficients?

August 19, 2022 python, scikit-learn No comments

Issue

I am trying to fit a polynomial of 2nd order and 3 independent variables to a set of data points:

Y_poly(u,v,w) = k0 + k1 * u + k2 * v + k3 * w + k4 * u^2 + k5 * u * v + k6 * u * w + k7 * v^2 + k8 * v * w + k9 * w^2

For that I have a DataFrame that is looking something like this:

data.head()
        u       v     w            Y
0  298.15  268.15  2000  -944.826752
1  298.15  268.15  2200 -1034.683966
2  298.15  268.15  2400 -1123.690167
3  298.15  268.15  2600 -1211.844636
4  298.15  268.15  2800 -1299.146876

To fit the data, I am using sklearn:

import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

Separate the X_data from Y_data for better readability:

x_data = data[['u', 'v', 'w']]
Y_data = data['Y']

Generate a PolynomialFeature and transform the x_data to it:

poly_feat = PolynomialFeatures(degree=2)
poly = poly_feat.fit_transform(x_data)

Fit the LinearRegression:

reg = LinearRegression().fit(poly, Y_data)

This gives a (very) good score for the fit:

reg.score(poly, Y_data)
0.9999115032197147

Now I want to find out the coefficients of the polynomial to use it elsewhere.

coeffs = reg.coef_
print(coeffs)
[ 0.00000000e+00 -1.51892533e+02  1.27497434e+02  1.23432536e+00
  1.57629775e-02  5.19087317e-01  4.68548968e-03 -5.46868029e-01
 -1.17758263e-02  1.43135757e-05]

My understanding of the polynomial is now, that the following should give me the same result:

x_predict = poly_feat.fit_transform([[298.15, 268.15, 2600]])

reg.predict(x_predict)
array([-1205.1186659])

np.sum([coeffs[i]*x_predict[0][i] for i in range(10)])
-8790.692787984553

The result of reg.predict(x_predict) is quite accurate, as expected. But why are the results not equal?

In what order are the coefficients returned by sklearn.LinearRegression().coef_?

Additional information:

print(poly_feat.powers_)
[[0 0 0]
 [1 0 0]
 [0 1 0]
 [0 0 1]
 [2 0 0]
 [1 1 0]
 [1 0 1]
 [0 2 0]
 [0 1 1]
 [0 0 2]]

Edit: I set up a GitHub repo with the sample file: https://github.com/dradler-pbx/sklear_linReg_question

Solution

why are the results not equal?

I think you're just forgetting to add the intercept_ term.

In what order are the coefficients returned by sklearn.LinearRegression().coef_?

It's the same order as the result of powers_: first by total degree, then by a sort of lexicographic order to the variables' exponents.

You can get this more pleasantly from the get_feature_names_out method.

Answered By - Ben Reiniger

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, August 19, 2022

[FIXED] What is sklearn.LinearRegression().coef_ returning and in what order are the coefficients?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels