Issue
I'm exploring ridge regression. While comparing statsmodels
and sklearn
, I found that the two libraries result in different output for ridge regression. Below is an simple example of the difference
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.linear_model import Lasso, Ridge
np.random.seed(142131)
n = 500
d = pd.DataFrame()
d['A'] = np.random.normal(size=n)
d['B'] = d['A'] + np.random.normal(scale=0.25, size=n)
d['C'] = np.random.normal(size=n)
d['D'] = np.random.normal(size=n)
d['intercept'] = 1
d['Y'] = 5 - 2*d['A'] + 1*d['D'] + np.random.normal(size=n)
y = np.asarray(d['Y'])
X = np.asarray(d[['intercept', 'A', 'B', 'C', 'D']])
First, using sklearn
and Ridge
:
ridge = Ridge(alpha=1, fit_intercept=True)
ridge.fit(X=np.asarray(d[['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']]), y=y)
ridge.intercept_, ridge.coef_
which outputs 4.99721, -2.00968, 0.03363, -0.02145, 1.02895]
.
Next, statsmodels
and OLS.fit_regularized
:
penalty = np.array([0, 1., 1., 1., 1.])
ols = sm.OLS(y, X).fit_regularized(L1_wt=0., alpha=penalty)
ols.params
which outputs [5.01623, -0.69164, -0.63901, 0.00156, 0.55158]
. However, since these both are implementing ridge regression, I would expect them to be the same.
Note, that neither of these penalize the intercept term (already checked that as a possible potential difference). I also don't think this is an error on my part. Specifically, I find both implementations provide the same output for LASSO. Below is a demonstration with the previous data
# sklearn LASSO
lasso = Lasso(alpha=0.5, fit_intercept=True)
lasso.fit(X=np.asarray(d[['A', 'B', 'C', 'D']]), y=y)
lasso.intercept_, lasso.coef_
# statsmodels LASSO
penalty = np.array([0, 0.5, 0.5, 0.5, 0.5])
ols = sm.OLS(y, X).fit_regularized(L1_wt=1., alpha=penalty)
ols.params
which both output [5.01465, -1.51832, 0., 0., 0.57799]
.
So my question is why do the estimated coefficients for ridge regression differ across implementations in sklearn
and statsmodels
?
Solution
After digging around a little more, I discovered the answer as to why they differ. The difference is that sklearn
's Ridge
scales the penalty term as alpha / n
where n
is the number of observations. statsmodels
does not apply this scaling of the tuning parameter. You can have the ridge implementations match if you re-scale the penalty for statsmodels
.
Using my posted example, here is how you would have the output match between the two:
# sklearn
# NOTE: there is no difference from above
ridge = Ridge(alpha=1, fit_intercept=True)
ridge.fit(X=np.asarray(d[['A', 'B', 'C', 'D']]), y=y)
ridge.intercept_, ridge.coef_
# statsmodels
# NOTE: going to re-scale the penalties based on n observations
n = X.shape[0]
penalty = np.array([0, 1., 1., 1., 1.]) / n # scaling penalties
ols = sm.OLS(y, X).fit_regularized(L1_wt=0., alpha=penalty)
ols.params
Now both output [ 4.99721, -2.00968, 0.03363, -0.02145, 1.02895]
.
I am posting this, so if someone else finds them in my situation they can find the answer more easily (since I haven't seen any discussion of this difference before). I'm not sure of the rationale for the re-scaling. It is also odd to me that Ridge
re-scales the tuning parameter but Lasso
does not. Looks like important behavior to be aware of. Reading the sklearn
documentation for Ridge and LASSO, I did not see the difference in re-scaling behavior for Ridge
discussed.
Answered By - pzivich
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.