Issue
Consider the following example:
import statsmodels.formula.api as smf
import random
import pandas as pd
df = pd.DataFrame({'y' : [x**2 + random.gauss(2) for x in range(10)],
'x' : [x for x in range(10)]})
model = smf.ols(data = df, formula = 'y ~ x + I(x**2) + I(x**3)').fit()
df['pred'] = model.predict(df)
df.set_index('x').plot()
As you can see, I fit a cubic model to my data and the fit is overall pretty good. However, I would like to constrain my cubic model to have the following values at two specific x points:
f(0) = 10
f(8) = 60
How can I do that in statsmodels
or sklearn
?
Thanks!
Solution
You can use fit_constrained
using glm
.
import random
import pandas as pd
import statsmodels.formula.api as smf
df = pd.DataFrame(
{
"y" : [x ** 2 + random.gauss(2, 1) for x in range(10)],
"x" : [x for x in range(10)],
}
)
untrained_glm = smf.glm("y ~ x + I(x ** 2) + I(x ** 3)", df)
trained_glm = untrained_glm.fit_constrained(
([[1, 0, 0, 0], [1, 8, 64, 512]], [8, 60])
)
df["pred"] = trained_glm.predict(df)
Result:
>>> df
y x pred
0 0.191139 0 8.000000
1 3.225092 1 6.110541
2 5.353590 2 7.008272
3 9.367904 3 10.498092
4 16.512384 4 16.384900
5 28.742154 5 24.473595
6 36.584476 6 34.569078
7 51.006869 7 46.476246
8 66.839006 8 60.000000
9 82.163031 9 74.945239
(Edit to add explanation of the constraints)
Suppose the model is y = a + b * x + c * (x ** 2) + d * (x ** 3) + e
, where e
is the error term, a
is the intercept and b
, c
, d
are coefficients for other degrees.
The fitted model f
will satisfy f(0) = 8
if and only if 8 = a_est + b_est * 0 + c_est * 0 + d_est * 0
and will satisfy f(8) = 60
if and only if 60 = a_est + b_est * 8 + c_est * (8 ** 2) + d_est * (8 ** 3)
.
So, I am adding the following constraints to the model:
1 * a + 0 * b + 0 * c + 0 * d = 8
1 * a + 8 * b + 64 * c + 512 * d = 60
Answered By - yarnabrina
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.