Issue
I have a small doubt regarding the neg_mean_squared_error of sklearnmetrics. I am using a regression model Ridge with a cross validation
cross_val_score(estimator, X_train, y_train, cv=5, scoring='neg_mean_squared_error')
i am using different values of alphas to choose to best model.
alphas= (0.01, 0.05, 0.1, 0.3, 0.8, 1, 5, 10, 15, 30, 50)
I calculate the mean value of of the 5 values returned by the cross_val_score and I plotted them in this figure (mean value of the score is the y axis, alphas is the x axis)
Doing some research I see that with neg_mean_squared_error, we need to look for 'the smaller the better' does it mean I have to look for the smallest value "litterally", which would be the first value in my graph, or does it mean the smallest in terms of 'closest to 0'
in my case all values are negative, that is why i have a doubt about the interpretation
thank you very much
Solution
Scikit-learn considers by convention that a score follow the rule: 'higher values are better than lower values'. In this case a small MSE shows that your predictions are close to data so it follows the opposite rule. That's why sklearn consider the negative (actually opposite) MSE as score. Thus a big neg_mean_squared_error
is better than a low one. It is also coherent with your graph because extreme values for parameters generally degrades a model.
Screen from the Scikit-learn website that indicates precisely the things:
Answered By - Valentin Goldité
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.