Issue
I am using the scikit-learn implementation of Gaussian Process Regression here and I want to fit single points instead of fitting a whole set of points. But the resulting alpha coefficients should remain the same e.g.
gpr2 = GaussianProcessRegressor()
for i in range(x.shape[0]):
gpr2.fit(x[i], y[i])
should be the same as
gpr = GaussianProcessRegressor().fit(x, y)
But when accessing gpr2.alpha_
and gpr.alpha_
, they are not the same. Why is that?
Indeed, I am working on a project where new data points arise. I dont want to append the x, y arrays and fit on the whole dataset again as it is very time intense. Let x be of size n, then I am having:
n+(n-1)+(n-2)+...+1 € O(n^2) fittings
when considering that the fitting itself is quadratic (correct me if I'm wrong), the run time complexity should be in O(n^3). It would be more optimal, if I do a single fitting on n points:
1+1+...+1 = n € O(n)
Solution
What you refer to is actually called online learning or incremental learning; it is in itself a huge sub-field in machine learning, and is not available out-of-the-box for all scikit-learn models. Quoting from the relevant documentation:
Although not all algorithms can learn incrementally (i.e. without seeing all the instances at once), all estimators implementing the
partial_fit
API are candidates. Actually, the ability to learn incrementally from a mini-batch of instances (sometimes called “online learning”) is key to out-of-core learning as it guarantees that at any given time there will be only a small amount of instances in the main memory.
Following this excerpt in the linked document above, there is a complete list of all scikit-learn models currently supporting incremental learning, from where you can see that GaussianProcessRegressor
is not one of them.
Answered By - desertnaut
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.