Issue
i'm studying gaussian process regression, and i'm trying to use the built-in functions from scikit-learn, and also trying to impement a custom function for doing so.
This is the code when using scikit-learn:
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s
X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) # Function
v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized
hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn
Out[]: 14.1**2 * RBF(length_scale=3.7) + WhiteKernel(noise_level=1e-05) #result
And this is the code i wrote manually:
def marglike(par,X,Y): #defining log-marginal-likelihood
# print(par)
l,var,sigma_n = par
n = len(X)
dist_X = (X - X.T)**2
# print(dist_X)
k = var*np.exp(-(1/(2*(l**2)))*dist_X)
inverse = np.linalg.inv(k + (sigma_n**2)*np.eye(len(k)))
ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k +
(sigma_n**2)*np.eye(len(k)))) + (n/2)*np.log(2*np.pi)
return ml
b= [0.0005,100]
bnd = [b,b,b] #bounds used for "minimize" function
start = np.array([1.1,1.6,0.05]) #initial hyperparameters values
re = minimize(marglike,start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the
method used is the same as the one used by scikit-learn
re.x #Hyperparameter results
Out[]: array([3.55266484e+00, 9.99986210e+01, 5.00000000e-04])
As you can see, the hyperparameter i got from the 2 methods are different, but yet i used the same data(X,Y) and same minimization method.
Could somebody help me to understand why and maybe how to get same results ?!
Solution
As suggested by San Mason, adding noise actually works! Otherwise, while you do it manually (in the custom code), set the initial noise to reasonably low and have multiple restarts with different initializations then you will get values close by. By the way, noiseless data seems to be creating a stationary ridge in the space of hyperparameters (like Fig. 1.6 in Surrogates GP book). Note that scikit-learn noise is sigma_n^2
for your custom function. Below are the snippets of noisy and noise-less cases.
Noise-less case
scikit-learn
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s
X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) #+ np.random.normal(10)# Function
v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized
# hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn
# Out[]: 14.1**2 * RBF(length_scale=3.7) + WhiteKernel(noise_level=1e-05) #result
custom function
def marglike(par,X,Y): #defining log-marginal-likelihood
# print(par)
l,std,sigma_n = par
n = len(X)
dist_X = (X - X.T)**2
# print(dist_X)
k = std**2*np.exp(-(dist_X/(2*(l**2)))) + (sigma_n**2)*np.eye(n)
inverse = np.linalg.inv(k)
ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k)) + (n/2)*np.log(2*np.pi)
return ml[0,0]
b= [10**-5,10**5]
bnd = [b,b,b] #bounds used for "minimize" function
start = [1,1,10**-5] #initial hyperparameters values
re = minimize(fun=marglike,x0=start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the
# method used is the same as the one used by scikit-learn
re.x[1], re.x[0], re.x[2]**2
# Output - (9.920690495739379, 3.5657912350017575, 1.0000000000000002e-10)
Noisy case
scikit-learn
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s
X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) + np.random.normal(size=10).reshape(10,1)*0.1 # Function
v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized
# hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn
# Out[]: 10.3**2 * RBF(length_scale=3.45) + WhiteKernel(noise_level=0.00792) #result
Custom function
def marglike(par,X,Y): #defining log-marginal-likelihood
# print(par)
l,std,sigma_n = par
n = len(X)
dist_X = (X - X.T)**2
# print(dist_X)
k = std**2*np.exp(-(dist_X/(2*(l**2)))) + (sigma_n**2)*np.eye(n)
inverse = np.linalg.inv(k)
ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k)) + (n/2)*np.log(2*np.pi)
return ml[0,0]
b= [10**-5,10**5]
bnd = [b,b,b] #bounds used for "minimize" function
start = [1,1,10**-5] #initial hyperparameters values
re = minimize(fun=marglike,x0=start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the
# method used is the same as the one used by scikit-learn
re.x[1], re.x[0], re.x[2]**2
# Output - (10.268943740577331, 3.4462604625225106, 0.007922681239535326)
Answered By - Zeel B Patel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.