Friday, September 30, 2022

[FIXED] Confused about the scikit-learn RMSE

September 30, 2022 python, scikit-learn No comments

Issue

I am trying to evaluate my model and I have set the scoring to be neg_root_mean_squared_error the results are negative as expected (see here). Yet, I am used to a positive value for the RMSE (the lower the better), so is it correct if I say the RMSE of the model is +0.0725, or am I missing something?

crossvalidation_Decision_Trees = KFold(n_splits=4, random_state=0,shuffle=True) 
model2 = new_model.fit(X_normalized, y_for_normalized)
scores_D_Trees = cross_val_score(model2, X_normalized,y_for_normalized, scoring='neg_root_mean_squared_error', cv=crossvalidation_Decision_Trees,
 n_jobs=1)
    
    
print("\n\nDecision Trees"+": RMSE for every fold: " + str(scores_D_Trees))
print('\033[1m'+"Decision Trees"+'\033[1m'+": Average RMSE for all the folds: " + str(np.mean(scores_D_Trees)) + '\033[0m'+ ", STD: " + str(np.std(scores_D_Trees)))

Results:

Decision Trees: RMSE for every fold: [-0.0413202  -0.08435709 -0.08474064 -0.07967769]
Decision Trees: Average RMSE for all the folds: -0.07252390274931717, STD: 0.01812540303759248

Solution

For sklearn model selection routines, the greater the score is, the better. For MSE and similar metrics it's the other way round, so those are used with a negative sign.

Standard scorers utilize make_scorer() (see sklearn/metrics/_scorer.py):

neg_root_mean_squared_error_scorer = make_scorer(
    mean_squared_error, greater_is_better=False, squared=False
)
...
sign = 1 if greater_is_better else -1
...
return self._sign * self._score_func

So yeah, nothing but multiplying the result of mean_squared_error(squared=False) by -1 happens there.

Answered By - dx2-66

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, September 30, 2022

[FIXED] Confused about the scikit-learn RMSE

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels