Issue
I found sklearn.svm.LinearSVC
and sklearn.svm.SVC(kernel='linear')
and they seem very similar to me, but I get very different results on Reuters.
sklearn.svm.LinearSVC: 81.05% in 28.87s train / 9.71s test
sklearn.svm.SVC : 33.55% in 6536.53s train / 2418.62s test
Both have a linear kernel. The tolerance of the LinearSVC is higher than the one of SVC:
LinearSVC(C=1.0, tol=0.0001, max_iter=1000, penalty='l2', loss='squared_hinge', dual=True, multi_class='ovr', fit_intercept=True, intercept_scaling=1)
SVC (C=1.0, tol=0.001, max_iter=-1, shrinking=True, probability=False, cache_size=200, decision_function_shape=None)
How do both functions differ otherwise? Even if I set kernel='linear
, tol=0.0001
, max_iter=1000 and
decision_function_shape='ovr'the
SVCtakes much longer than
LinearSVC`. Why?
I use sklearn 0.18
and both are wrapped in the OneVsRestClassifier
. I'm not sure if this makes the same as multi_class='ovr'
/ decision_function_shape='ovr'
.
Solution
Truly, LinearSVC
and SVC(kernel='linear')
yield different results, i. e. metrics score and decision boundaries, because they use different approaches. The toy example below proves it:
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC, SVC
X, y = load_iris(return_X_y=True)
clf_1 = LinearSVC().fit(X, y) # possible to state loss='hinge'
clf_2 = SVC(kernel='linear').fit(X, y)
score_1 = clf_1.score(X, y)
score_2 = clf_2.score(X, y)
print('LinearSVC score %s' % score_1)
print('SVC score %s' % score_2)
--------------------------
>>> 0.96666666666666667
>>> 0.98666666666666669
The key principles of that difference are the following:
- By default scaling,
LinearSVC
minimizes the squared hinge loss whileSVC
minimizes the regular hinge loss. It is possible to manually define a 'hinge' string forloss
parameter inLinearSVC
. LinearSVC
uses the One-vs-All (also known as One-vs-Rest) multiclass reduction whileSVC
uses the One-vs-One multiclass reduction. It is also noted here. Also, for multi-class classification problemSVC
fitsN * (N - 1) / 2
models whereN
is the amount of classes.LinearSVC
, by contrast, simply fitsN
models. If the classification problem is binary, then only one model is fit in both scenarios.multi_class
anddecision_function_shape
parameters have nothing in common. The second one is an aggregator that transforms the results of the decision function in a convenient shape of(n_features, n_samples)
.multi_class
is an algorithmic approach to establish a solution.- The underlying estimators for
LinearSVC
are liblinear, that do in fact penalize the intercept.SVC
uses libsvm estimators that do not. liblinear estimators are optimized for a linear (special) case and thus converge faster on big amounts of data than libsvm. That is whyLinearSVC
takes less time to solve the problem.
In fact, LinearSVC
is not actually linear after the intercept scaling as it was stated in the comments section.
Answered By - E.Z
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.