Issue
I am trying to fit a polynomial of 2nd order and 3 independent variables to a set of data points:
Y_poly(u,v,w) = k0 + k1 * u + k2 * v + k3 * w + k4 * u^2 + k5 * u * v + k6 * u * w + k7 * v^2 + k8 * v * w + k9 * w^2
For that I have a DataFrame that is looking something like this:
data.head()
u v w Y
0 298.15 268.15 2000 -944.826752
1 298.15 268.15 2200 -1034.683966
2 298.15 268.15 2400 -1123.690167
3 298.15 268.15 2600 -1211.844636
4 298.15 268.15 2800 -1299.146876
To fit the data, I am using sklearn:
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
Separate the X_data from Y_data for better readability:
x_data = data[['u', 'v', 'w']]
Y_data = data['Y']
Generate a PolynomialFeature and transform the x_data to it:
poly_feat = PolynomialFeatures(degree=2)
poly = poly_feat.fit_transform(x_data)
Fit the LinearRegression:
reg = LinearRegression().fit(poly, Y_data)
This gives a (very) good score for the fit:
reg.score(poly, Y_data)
0.9999115032197147
Now I want to find out the coefficients of the polynomial to use it elsewhere.
coeffs = reg.coef_
print(coeffs)
[ 0.00000000e+00 -1.51892533e+02 1.27497434e+02 1.23432536e+00
1.57629775e-02 5.19087317e-01 4.68548968e-03 -5.46868029e-01
-1.17758263e-02 1.43135757e-05]
My understanding of the polynomial is now, that the following should give me the same result:
x_predict = poly_feat.fit_transform([[298.15, 268.15, 2600]])
reg.predict(x_predict)
array([-1205.1186659])
np.sum([coeffs[i]*x_predict[0][i] for i in range(10)])
-8790.692787984553
The result of reg.predict(x_predict)
is quite accurate, as expected. But why are the results not equal?
In what order are the coefficients returned by sklearn.LinearRegression().coef_
?
Additional information:
print(poly_feat.powers_)
[[0 0 0]
[1 0 0]
[0 1 0]
[0 0 1]
[2 0 0]
[1 1 0]
[1 0 1]
[0 2 0]
[0 1 1]
[0 0 2]]
Edit: I set up a GitHub repo with the sample file: https://github.com/dradler-pbx/sklear_linReg_question
Solution
why are the results not equal?
I think you're just forgetting to add the intercept_
term.
In what order are the coefficients returned by
sklearn.LinearRegression().coef_
?
It's the same order as the result of powers_
: first by total degree, then by a sort of lexicographic order to the variables' exponents.
You can get this more pleasantly from the get_feature_names_out
method.
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.