Issue
I am using xgboost for the first time and trying the two different interfaces. First I get the data:
import xgboost as xgb
import dlib
import pandas as pd
import numpy as np
from sklearn.model_selection import cross_val_score
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
X = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
y = raw_df.values[1::2, 2]
dmatrix = xgb.DMatrix(data=X, label=y)
Now the scikit-learn interface:
xgbr = xgb.XGBRegressor(objective='reg:squarederror', seed=20)
print(cross_val_score(xgbr, X, y, cv=5))
This outputs:
[0.73438184 0.84902986 0.82579692 0.52374618 0.29743001]
Now the xgboost native interface:
dmatrix = xgb.DMatrix(data=X, label=y)
params={'objective':'reg:squarederror'}
cv_results = xgb.cv(dtrain=dmatrix, params=params, nfold=5, metrics={'rmse'}, seed=20)
print('RMSE: %.2f' % cv_results['test-rmse-mean'].min())
This gives 3.50
.
Why are the outputs so different? What am I doing wrong?
Solution
First of all, you didn't specify the metric in cross_val_score
, therefore you are not calculating RMSE, but rather the estimator's default metric, which is usually just its loss function. You need to specify it for comparable results:
cross_val_score(xgbr, X, y, cv=5, scoring = 'neg_root_mean_squared_error')
Second, you need to match sklearn
's CV procedure exactly. For that, you can pass folds
argument to XGBoost
's cv
method:
from sklearn.model_selection import KFold
cv_results = xgb.cv(dtrain=dmatrix, params=params, metrics={'rmse'}, folds = KFold(n_splits=5))
Finally, you need to ensure that XGBoost
's cv
procedure actually converges. For some reason it only does 10 boosting rounds by default, which is too low to converge on your dataset. This is done via nrounds
argument (num_boost_round
if you're on an older version), I found that 100 rounds work just fine on this dataset:
cv_results = xgb.cv(dtrain=dmatrix, params=params, metrics={'rmse'}, folds = KFold(n_splits=5), nrounds = 100)
Now you will get matching results.
On a side note, it's interesting how you say it's your first time using XGBoost
, but you actually have a question on XGBoost
dating back to 2017.
Answered By - Always Right Never Left
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.