Saturday, December 30, 2023

[FIXED] XGBoost raising ValueError with sklearn metric

December 30, 2023 classification, machine-learning, python, scikit-learn, xgboost No comments

Issue

Im trying to use an XGBClassifier with a validation set and a metric taken from sklearn.metrics as eval_metric, as suggested by the XGBoost documentation.

The MWE looks like this:

import numpy as np
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

x_train, y_train = np.random.rand(10,3), np.where(np.random.rand(10,)>0.5, 1, 0)
x_valid, y_valid = np.random.rand(5,3), np.where(np.random.rand(5,)>0.5, 1, 0)

model = XGBClassifier(
    n_estimators=100,
    eval_metric=accuracy_score
)

model.fit(
    X=x_train, y=y_train,
    eval_set=[(x_train, y_train), (x_valid, y_valid)]
)

This code raises the following error message:

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-5-b63cd5cfabda> in <cell line: 1>()
----> 1 model.fit(
      2     X=x_train, y=y_train,
      3     eval_set=[(x_train, y_train), (x_valid, y_valid)]
      4 )

9 frames

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
     93 
     94     if len(y_type) > 1:
---> 95         raise ValueError(
     96             "Classification metrics can't handle a mix of {0} and {1} targets".format(
     97                 type_true, type_pred

ValueError: Classification metrics can't handle a mix of binary and continuous targets

The same code works commenting out the eval_set line, or using instead eval_metric="error", for example. What am I doing wrong and how is it solved?

Edit: I'd like to use in the future different metrics like sklearn.metrics.balanced_accuracy_score or sklearn.metrics.recall_score.

Solution

The reason is that xgboost will feed probability outputs to the evaluation function (your accuracy here), but sklearn's accuracy score is expecting hard decisions (1s or 0s) not probabilities. It is unaware of your decision threshold, so it cannot map them to hard decisions.

You can use

model = xgb.XGBClassifier(
    n_estimators=100,
    eval_metric='error'
)

model = xgb.XGBClassifier(
    n_estimators=100,
    eval_metric='[email protected]'
)

for a threshold of 0.6 instead of 0.5. See https://xgboost.readthedocs.io/en/stable/parameter.html

For recall, since it's not in the xgboost builtin options, you need to manually threshold your predictions:

import numpy as np
from xgboost import XGBClassifier
import xgboost as xgb
from sklearn.metrics import accuracy_score, recall_score

x_train, y_train = np.random.rand(10,3), np.where(np.random.rand(10,)>0.5, 1, 0)
x_valid, y_valid = np.random.rand(5,3), np.where(np.random.rand(5,)>0.5, 1, 0)

def thresholded_recall_score(y_true, y_preds, thresh=0.5):
    return recall_score(y_true, y_preds > thresh)

model = xgb.XGBClassifier(
    n_estimators=100,
    eval_metric=thresholded_recall_score
)

model.fit(
    X=x_train, y=y_train,
    eval_set=[(x_train, y_train), (x_valid, y_valid)]
)

Answered By - Learning is a mess

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, December 30, 2023

[FIXED] XGBoost raising ValueError with sklearn metric

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels