Issue
Im trying to use an XGBClassifier with a validation set and a metric taken from sklearn.metrics
as eval_metric
, as suggested by the XGBoost documentation.
The MWE looks like this:
import numpy as np
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
x_train, y_train = np.random.rand(10,3), np.where(np.random.rand(10,)>0.5, 1, 0)
x_valid, y_valid = np.random.rand(5,3), np.where(np.random.rand(5,)>0.5, 1, 0)
model = XGBClassifier(
n_estimators=100,
eval_metric=accuracy_score
)
model.fit(
X=x_train, y=y_train,
eval_set=[(x_train, y_train), (x_valid, y_valid)]
)
This code raises the following error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-b63cd5cfabda> in <cell line: 1>()
----> 1 model.fit(
2 X=x_train, y=y_train,
3 eval_set=[(x_train, y_train), (x_valid, y_valid)]
4 )
9 frames
/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
93
94 if len(y_type) > 1:
---> 95 raise ValueError(
96 "Classification metrics can't handle a mix of {0} and {1} targets".format(
97 type_true, type_pred
ValueError: Classification metrics can't handle a mix of binary and continuous targets
The same code works commenting out the eval_set
line, or using instead eval_metric="error"
, for example. What am I doing wrong and how is it solved?
Edit: I'd like to use in the future different metrics like sklearn.metrics.balanced_accuracy_score
or sklearn.metrics.recall_score
.
Solution
The reason is that xgboost will feed probability outputs to the evaluation function (your accuracy here), but sklearn's accuracy score is expecting hard decisions (1s or 0s) not probabilities. It is unaware of your decision threshold, so it cannot map them to hard decisions.
You can use
model = xgb.XGBClassifier(
n_estimators=100,
eval_metric='error'
)
or
model = xgb.XGBClassifier(
n_estimators=100,
eval_metric='[email protected]'
)
for a threshold of 0.6 instead of 0.5. See https://xgboost.readthedocs.io/en/stable/parameter.html
For recall, since it's not in the xgboost builtin options, you need to manually threshold your predictions:
import numpy as np
from xgboost import XGBClassifier
import xgboost as xgb
from sklearn.metrics import accuracy_score, recall_score
x_train, y_train = np.random.rand(10,3), np.where(np.random.rand(10,)>0.5, 1, 0)
x_valid, y_valid = np.random.rand(5,3), np.where(np.random.rand(5,)>0.5, 1, 0)
def thresholded_recall_score(y_true, y_preds, thresh=0.5):
return recall_score(y_true, y_preds > thresh)
model = xgb.XGBClassifier(
n_estimators=100,
eval_metric=thresholded_recall_score
)
model.fit(
X=x_train, y=y_train,
eval_set=[(x_train, y_train), (x_valid, y_valid)]
)
Answered By - Learning is a mess
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.