Issue
I have seen this hinge loss chart:
https://math.stackexchange.com/questions/782586/how-do-you-minimize-hinge-loss
And also here:
https://programmathically.com/understanding-hinge-loss-and-the-svm-cost-function/
However, creating the "same" graph using scikit-learn, is quite similar but seems the "opposite". Code is as follows:
from sklearn.metrics import hinge_loss
import matplotlib.pyplot as plt
import numpy as np
predicted = np.arange(-10, 11, 1)
y_true = [1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
loss[i] = hinge_loss(np.array([y]), np.array([p]))
plt.plot(predicted, loss)
plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')
And some specific points in the chart above:
hinge_loss([1], [-5]) = 0.0,
hinge_loss([1], [-1]) = 0.0,
hinge_loss([1], [0]) = 1.0,
hinge_loss([1], [1]) = 2.0,
hinge_loss([1], [5]) = 6.0
predicted = np.arange(-10, 11, 1)
y_true = [-1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
loss[i] = hinge_loss(np.array([y]), np.array([p]))
plt.plot(predicted, loss)
plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')
And some specific points in the chart above:
hinge_loss([-1], [-5]) = 0.0,
hinge_loss([-1], [-1]) = 0.0,
hinge_loss([-1], [0]) = 1.0,
hinge_loss([-1], [1]) = 2.0,
hinge_loss([-1], [5]) = 6.0
Can someone explain me why hinge_loss()
in scikit-learn seems like the opposite from the other two first charts?
Many thanks in advance
EDIT: Based on the answer, I can reproduce the same output without flipping the values. This is based on the following:
As hinge_loss([0], [-1])==0
and hinge_loss([-2], [-1])==0
.
Based on this, I can call hinge_loss()
with an array of two values without altering the calculated loss.
Following code does not flip the values:
predicted = np.arange(-10, 11, 1)
y_true = [1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
loss[i] = hinge_loss(np.array([y, 0]), np.array([p, -1])) * 2
plt.plot(predicted, loss)
plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')
predicted = np.arange(-10, 11, 1)
y_true = [-1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
loss[i] = hinge_loss(np.array([y,-2]), np.array([p,-1])) * 2
plt.plot(predicted, loss)
plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')
The question now is why for each corresponding case, those "combinations" of values work well.
Solution
Having a look at the code underlying the hinge_loss
implementation, the following is what happens in the binary case:
lbin = LabelBinarizer(neg_label=-1)
y_true = lbin.fit_transform(y_true)[:, 0]
try:
margin = y_true * pred_decision
except TypeError:
raise TypeError("pred_decision should be an array of floats.")
losses = 1 - margin
Due to the fact that LabelBinarizer.fit_transform()
behaviour in case of single label defaults to returning an array of negative labels
from sklearn.preprocessing import LabelBinarizer
lbin = LabelBinarizer(neg_label=-1)
lbin.fit_transform([1, 1, 1, 1, 1, 1, 1]) # returns array([[-1],[-1],[-1][-1],[-1],[-1],[-1]])
this implies that the (unique) label sign gets flipped, which explains the plot you obtain.
Despite the example with single label being quite weird, there has been ofc some debate on such issue, see https://github.com/scikit-learn/scikit-learn/issues/6723 eg. Digging into the github issues, it seems that they have not reached a final decision yet on potential fixes to be applied.
Answered By - amiola
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.