Issue
I'm trying to compute the intercept of a SVM classifier (linear kernel) using the dual coefficients. I'm using the SVC class instead of LinearSVC, because the later does not have a dual_coef_ attribute.
Here is the code:
from sklearn import svm
from sklearn.utils import shuffle
from sklearn.inspection import DecisionBoundaryDisplay
import matplotlib.pyplot as plt
import numpy as np
from numpy.testing import assert_array_almost_equal
seed = np.random.seed(123)
# Data
m = 1000
c2 = (-1, 1)
c1 = (2, 2.5)
r1 = 1.5
r2 = 3.5
rand_1_theta = np.random.uniform(0, 1, size=m)
rand_1_radius = np.random.uniform(0, 1, size=m)
ds1_x = (c1[0] + np.sqrt(rand_1_radius * r1) * np.cos(rand_1_theta * 2 * np.pi))
ds1_y = (c1[1] + np.sqrt(rand_1_radius * r1) * np.sin(rand_1_theta * 2 * np.pi))
rand_2_theta = np.random.uniform(0, 1, size=m)
rand_2_radius = np.random.uniform(0, 1, size=m)
ds2_x = (c2[0] + np.sqrt(rand_2_radius * r2) * np.cos(rand_2_theta * 2 * np.pi))
ds2_y = (c2[1] + np.sqrt(rand_2_radius * r2) * np.sin(rand_2_theta * 2 * np.pi))
X = np.stack((np.concatenate((ds1_x, ds2_x)),
np.concatenate((ds1_y, ds2_y))),
axis=1)
y = np.concatenate((np.zeros(m), np.ones(m)))
rgb = [(1,0,0) if val==0 else (0,0,1) for val in y]
# Preprocessing
X, y, rgb = shuffle(X, y, rgb, random_state=123)
# SVM
model = svm.SVC(probability=True, kernel='linear', random_state=123)
model.fit(X, y)
# Evaluation
support_vectors = model.support_vectors_
support_indexes = model.support_
coef = model.coef_
dual_coeff = model.dual_coef_
intercept = model.intercept_
primal_coeff = np.matmul(dual_coeff, support_vectors)
print(primal_coeff )
print(coef)
y_support = y[support_indexes]
all_bias = y_support - np.dot(support_vectors, primal_coeff.squeeze())
intercept_estimate_1 = np.mean(all_bias)
intercept_estimate_2 = np.min(all_bias)
print(intercept)
print(intercept_estimate_1)
print(intercept_estimate_2)
Notice the intercepts estimates using the mean and the minimum printed at the end of the script. As far as I know, the intercept is the mean of all intercept estimates. At least, most books present it this way. But seems scikit-learn is using the minimum value instead. I don't understand why, and the documentation does not explain how it is done.
Anyone can provide an explanation?
Solution
The intercept is neither the mean nor the min of all bias.
Instead, for libsvm (and therefore sklearn) it is the mean of y*gradient of all free vectors. You can see this in the LIBSVM paper, 4.1.5.
Because in prediction, only support vector matters and libsvm does not save all coef, I don’t think you can easily check this property without look deeply into the core.
But if you are really interested in it, you can check libsvm code for calculating the bias(=-rho) here
double Solver::calculate_rho()
{
double r;
int nr_free = 0;
double ub = INF, lb = -INF, sum_free = 0;
for(int i=0;i<active_size;i++)
{
double yG = y[i]*G[i];
if(is_upper_bound(i))
{
if(y[i]==-1)
ub = min(ub,yG);
else
lb = max(lb,yG);
}
else if(is_lower_bound(i))
{
if(y[i]==+1)
ub = min(ub,yG);
else
lb = max(lb,yG);
}
else
{
++nr_free;
sum_free += yG;
}
}
if(nr_free>0)
r = sum_free/nr_free;
else
r = (ub+lb)/2;
return r;
}
Answered By - hychou
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.