Issue
I have a matrix of reference values and would like to learn how Scikit-learn can be used to generate a regression model for it. I have done several types of univariate regressions in the past but it's not clear to me how to use two variables in sklearn.
I have two features (A and B) and a table of output values for certain input A/B values. See table and 3D surface below. I'd like to see how I can translate this to a two variable equation that relates the A/B inputs to the single value output, like shown in the table. The relationship looks nonlinear and it could also be quadratic, logarithmic, etc...
How do I use sklearn to perform a nonlinear regression on this tabular data?
A/B 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
0 8.78 8.21 7.64 7.07 6.50 5.92 5.35 4.78 4.21 3.63 3.06
5 8.06 7.56 7.07 6.58 6.08 5.59 5.10 4.60 4.11 3.62 3.12
10 7.33 6.91 6.50 6.09 5.67 5.26 4.84 4.43 4.01 3.60 3.19
15 6.60 6.27 5.93 5.59 5.26 4.92 4.59 4.25 3.92 3.58 3.25
20 5.87 5.62 5.36 5.10 4.85 4.59 4.33 4.08 3.82 3.57 3.31
25 5.14 4.97 4.79 4.61 4.44 4.26 4.08 3.90 3.73 3.55 3.37
30 4.42 4.32 4.22 4.12 4.02 3.93 3.83 3.73 3.63 3.53 3.43
35 3.80 3.78 3.75 3.72 3.70 3.67 3.64 3.62 3.59 3.56 3.54
40 2.86 2.93 2.99 3.05 3.12 3.18 3.24 3.31 3.37 3.43 3.50
45 2.08 2.24 2.39 2.54 2.70 2.85 3.00 3.16 3.31 3.46 3.62
50 1.64 1.84 2.05 2.26 2.46 2.67 2.88 3.08 3.29 3.50 3.70
55 1.55 1.77 1.98 2.19 2.41 2.62 2.83 3.05 3.26 3.47 3.69
60 2.09 2.22 2.35 2.48 2.61 2.74 2.87 3.00 3.13 3.26 3.39
65 3.12 3.08 3.05 3.02 2.98 2.95 2.92 2.88 2.85 2.82 2.78
70 3.50 3.39 3.28 3.17 3.06 2.95 2.84 2.73 2.62 2.51 2.40
75 3.42 3.32 3.21 3.10 3.00 2.89 2.78 2.68 2.57 2.46 2.36
80 3.68 3.55 3.43 3.31 3.18 3.06 2.94 2.81 2.69 2.57 2.44
85 3.43 3.35 3.28 3.21 3.13 3.06 2.99 2.91 2.84 2.77 2.69
90 3.43 3.35 3.28 3.21 3.13 3.06 2.99 2.91 2.84 2.77 2.69
95 3.43 3.35 3.28 3.21 3.13 3.06 2.99 2.91 2.84 2.77 2.69
100 3.43 3.35 3.28 3.21 3.13 3.06 2.99 2.91 2.84 2.77 2.69
Solution
There is probably a succinct nonlinear relationship between your A
, B
, and table values, but without some knowledge of this system nor with any sophisticated nonlinear modeling, here's a ridiculous model with a decent score
.
the_table = """1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
0 8.78 8.21 7.64 7.07 6.50 5.92 5.35 4.78 4.21 3.63 3.06
5 8.06 7.56 7.07 6.58 6.08 5.59 5.10 4.60 4.11 3.62 3.12
10 7.33 6.91 6.50 6.09 5.67 5.26 4.84 4.43 4.01 3.60 3.19
15 6.60 6.27 5.93 5.59 5.26 4.92 4.59 4.25 3.92 3.58 3.25
20 5.87 5.62 5.36 5.10 4.85 4.59 4.33 4.08 3.82 3.57 3.31
25 5.14 4.97 4.79 4.61 4.44 4.26 4.08 3.90 3.73 3.55 3.37
30 4.42 4.32 4.22 4.12 4.02 3.93 3.83 3.73 3.63 3.53 3.43
35 3.80 3.78 3.75 3.72 3.70 3.67 3.64 3.62 3.59 3.56 3.54
40 2.86 2.93 2.99 3.05 3.12 3.18 3.24 3.31 3.37 3.43 3.50
45 2.08 2.24 2.39 2.54 2.70 2.85 3.00 3.16 3.31 3.46 3.62
50 1.64 1.84 2.05 2.26 2.46 2.67 2.88 3.08 3.29 3.50 3.70
55 1.55 1.77 1.98 2.19 2.41 2.62 2.83 3.05 3.26 3.47 3.69
60 2.09 2.22 2.35 2.48 2.61 2.74 2.87 3.00 3.13 3.26 3.39
65 3.12 3.08 3.05 3.02 2.98 2.95 2.92 2.88 2.85 2.82 2.78
70 3.50 3.39 3.28 3.17 3.06 2.95 2.84 2.73 2.62 2.51 2.40
75 3.42 3.32 3.21 3.10 3.00 2.89 2.78 2.68 2.57 2.46 2.36
80 3.68 3.55 3.43 3.31 3.18 3.06 2.94 2.81 2.69 2.57 2.44
85 3.43 3.35 3.28 3.21 3.13 3.06 2.99 2.91 2.84 2.77 2.69
90 3.43 3.35 3.28 3.21 3.13 3.06 2.99 2.91 2.84 2.77 2.69
95 3.43 3.35 3.28 3.21 3.13 3.06 2.99 2.91 2.84 2.77 2.69
100 3.43 3.35 3.28 3.21 3.13 3.06 2.99 2.91 2.84 2.77 2.69"""
import numpy as np
import pandas as pd
import io
df = pd.read_csv(io.StringIO(initial_value=the_table), sep="\s+")
df.columns = df.columns.astype(np.uint64)
df_unstacked = df.unstack()
X = df_unstacked.index.tolist()
y = df_unstacked.to_list()
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(6)
poly_X = poly.fit_transform(X)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(poly_X,y)
print(model.score(poly_X,y))
# 0.9762180339233807
To predict
values given an A
and a B
using this model, you need to transform the input like was done for creating the model. So for example,
model.predict(poly.transform([(1534,56)]))
# array([2.75275659])
Even more outrageously ridiculous ...
more_X = [(a,b,np.log1p(a),np.log1p(b),np.cos(np.pi*b/100)) for a,b in X]
poly = PolynomialFeatures(5)
poly_X = poly.fit_transform(more_X)
model.fit(poly_X,y)
print(model.score(poly_X,y))
# 0.9982994398684035
... and to predict
:
more_X = [(a,b,np.log1p(a),np.log1p(b),np.cos(np.pi*b/100)) for a,b in [(1534,56)]]
model.predict(poly.transform(more_X))
# array([2.74577017])
N.B.: There's probably better ways to Pythonically program these ridiculous models.
Answered By - rickhg12hs
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.