Issue
I'm not seeing what is wrong with my code for regularized linear regression. Unregularized I have simply this, which I'm reasonably certain is correct:
import numpy as np
def get_model(features, labels):
return np.linalg.pinv(features).dot(labels)
Here's my code for a regularized solution, where I'm not seeing what is wrong with it:
def get_model(features, labels, lamb=0.0):
n_cols = features.shape[1]
return linalg.inv(features.transpose().dot(features) + lamb * np.identity(n_cols))\
.dot(features.transpose()).dot(labels)
With the default value of 0.0 for lamb, my intention is that it should give the same result as the (correct) unregularized version, but the difference is actually quite large.
Does anyone see what the problem is?
Solution
The problem is:
features.transpose().dot(features)
may not be invertible. And numpy.linalg.inv works only for full-rank matrix according to the documents. However, a (non-zero) regularization term always makes the equation nonsingular.
By the way, you are right about the implementation. But it is not efficient. An efficient way to solve this equation is the least squares method.
np.linalg.lstsq(features, labels)
can do the work for np.linalg.pinv(features).dot(labels)
.
In a general way, you can do this
def get_model(A, y, lamb=0):
n_col = A.shape[1]
return np.linalg.lstsq(A.T.dot(A) + lamb * np.identity(n_col), A.T.dot(y))
Answered By - nullas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.