Issue
I have read somewhere that it's not possible to interpret SVM decision values on non-linear kernels, so only the sign matters. However, I saw couple of articles putting a threshold on decision values (with SVMlight though) [1] [2]. So i'm not sure whether putting thresholds on decision values logical as well but i'm curious on the results anyways.
So, LibSVM python interface directly returns the decision values with predicted target when you call predict(), is there any way to do it with scikit-learn? I have trained a binary classification SVM model using svm.SVC(), but got stuck there right now.
In source codes i have found svm.libsvm.decision_function() function commented as "(libsvm name for this is predict_values)". Then i have seen the svm.SVC.decision_function() and checked its source code:
dec_func = libsvm.decision_function(
X, self.support_, self.support_vectors_, self.n_support_,
self.dual_coef_, self._intercept_, self._label,
self.probA_, self.probB_,
svm_type=LIBSVM_IMPL.index(self._impl),
kernel=kernel, degree=self.degree, cache_size=self.cache_size,
coef0=self.coef0, gamma=self._gamma)
# In binary case, we need to flip the sign of coef, intercept and
# decision function.
if self._impl in ['c_svc', 'nu_svc'] and len(self.classes_) == 2:
return -dec_func
It seems like it's doing the libsvm's predict equivalent, but why does it changes the sign of decision values, if it's the equivalent of ?
Also, is there any way to calculate confidence value for an SVM decision using this value or any prediction output (except probability estimates and Platt's method, my model is not good when probability estimates are calculated)? Or as it has been argued, the only sign matters for decision value in non-linear kernels?
[1] http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0039195#pone.0039195-Teng1
[2] http://link.springer.com/article/10.1007%2Fs00726-011-1100-2
Solution
It seems like it's doing the libsvm's predict equivalent, but why does it changes the sign of decision values, if it's the equivalent of ?
These are just implementation hacks regarding internal representation of class signs. Nothing to truly be worried about.
sklearn decision_function
is the value of inner product between SVM's hyerplane w
and your data x
(possibly in the kernel induced space), so you can use it, shift or analyze. Its interpretation, however is very abstract, as in case of rbf kernel it is simply the integral of the product of normal distribution centered in x
with variance equal to 1/(2*gamma)
and the weighted sum of normal distributions centered in support vectors (and the same variance), where weights are alpha
coefficients.
Also, is there any way to calculate confidence value for an SVM decision using this value or any prediction
Platt's scaling is used not because there is some "lobby" forcing us to - simply this is the "correct" way of estimating SVM's confidence. However, if you are not interested in "probability sense" confidence, but rather any value that you can qualitatively compare (which point is more confident) than decision function can be used to do it. It is roughly the distance between the point image in kernel space and the separating hyperplane (up to the normalizing constant being the norm of w
). So it is true, that
abs(decision_function(x1)) < abs(decision_function(x2)) =>
x1
is less confident than x2
.
In short - bigger the decision_function
value, the "deeper" the point is in its hyperplane.
Answered By - lejlot
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.