Tuesday, February 1, 2022

[FIXED] Different results when optimizing hyperparameter for a Gaussian process regression

February 01, 2022 gaussian-process, hyperparameters, log-likelihood, python, scikit-learn No comments

Issue

i'm studying gaussian process regression, and i'm trying to use the built-in functions from scikit-learn, and also trying to impement a custom function for doing so.

This is the code when using scikit-learn:

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s

X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) # Function

v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized 
hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn 
Out[]: 14.1**2 * RBF(length_scale=3.7) + WhiteKernel(noise_level=1e-05) #result

And this is the code i wrote manually:

def marglike(par,X,Y): #defining log-marginal-likelihood
# print(par)
l,var,sigma_n = par
n = len(X)
dist_X = (X - X.T)**2
# print(dist_X)
k = var*np.exp(-(1/(2*(l**2)))*dist_X)
inverse = np.linalg.inv(k + (sigma_n**2)*np.eye(len(k))) 
ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k + 
(sigma_n**2)*np.eye(len(k)))) + (n/2)*np.log(2*np.pi)
return ml

b= [0.0005,100]
bnd = [b,b,b] #bounds used for "minimize" function
start = np.array([1.1,1.6,0.05]) #initial hyperparameters values
re = minimize(marglike,start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the 
method used is the same as the one used by scikit-learn
re.x #Hyperparameter results
Out[]: array([3.55266484e+00, 9.99986210e+01, 5.00000000e-04])

As you can see, the hyperparameter i got from the 2 methods are different, but yet i used the same data(X,Y) and same minimization method.

Could somebody help me to understand why and maybe how to get same results ?!

Solution

As suggested by San Mason, adding noise actually works! Otherwise, while you do it manually (in the custom code), set the initial noise to reasonably low and have multiple restarts with different initializations then you will get values close by. By the way, noiseless data seems to be creating a stationary ridge in the space of hyperparameters (like Fig. 1.6 in Surrogates GP book). Note that scikit-learn noise is sigma_n^2 for your custom function. Below are the snippets of noisy and noise-less cases.

Noise-less case

scikit-learn

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s

X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) #+ np.random.normal(10)# Function

v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized 
# hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn 
# Out[]: 14.1**2 * RBF(length_scale=3.7) + WhiteKernel(noise_level=1e-05) #result

custom function

def marglike(par,X,Y): #defining log-marginal-likelihood
    # print(par)
    l,std,sigma_n = par
    n = len(X)
    dist_X = (X - X.T)**2
    # print(dist_X)
    k = std**2*np.exp(-(dist_X/(2*(l**2)))) + (sigma_n**2)*np.eye(n)
    inverse = np.linalg.inv(k) 
    ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k)) + (n/2)*np.log(2*np.pi)
    return ml[0,0]

b= [10**-5,10**5]
bnd = [b,b,b] #bounds used for "minimize" function
start = [1,1,10**-5] #initial hyperparameters values
re = minimize(fun=marglike,x0=start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the 
# method used is the same as the one used by scikit-learn
re.x[1], re.x[0], re.x[2]**2
# Output - (9.920690495739379, 3.5657912350017575, 1.0000000000000002e-10)

Noisy case

scikit-learn

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s

X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) + np.random.normal(size=10).reshape(10,1)*0.1 # Function

v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized 
# hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn 
# Out[]: 10.3**2 * RBF(length_scale=3.45) + WhiteKernel(noise_level=0.00792) #result

Custom function

def marglike(par,X,Y): #defining log-marginal-likelihood
    # print(par)
    l,std,sigma_n = par
    n = len(X)
    dist_X = (X - X.T)**2
    # print(dist_X)
    k = std**2*np.exp(-(dist_X/(2*(l**2)))) + (sigma_n**2)*np.eye(n)
    inverse = np.linalg.inv(k) 
    ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k)) + (n/2)*np.log(2*np.pi)
    return ml[0,0]

b= [10**-5,10**5]
bnd = [b,b,b] #bounds used for "minimize" function
start = [1,1,10**-5] #initial hyperparameters values
re = minimize(fun=marglike,x0=start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the 
# method used is the same as the one used by scikit-learn
re.x[1], re.x[0], re.x[2]**2
# Output - (10.268943740577331, 3.4462604625225106, 0.007922681239535326)

Answered By - Zeel B Patel

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 1, 2022

[FIXED] Different results when optimizing hyperparameter for a Gaussian process regression

Issue

Solution

Noise-less case

scikit-learn

custom function

Noisy case

scikit-learn

Custom function

0 comments:

Post a Comment

Popular Posts

Labels