Issue
Usually we split the original feature and target data (X,y) in (X_train, y_train) and (X_test, y_test).
By using the method:
mae_A = cross_val_score(clf, X_train_scaled, y_train, scoring="neg_mean_absolute_error", cv=kfold)
I get the cross validation Mean Absolute Error (MAE) for the (X_train, y_train), right?
How can I get the MAE (from the previous cross-validation models got by using (X_train, y_train)) for the (X_test, y_test)?
Solution
This is the correct approach. As a rule, you should only train your model using training data.
Thus the test_set
should remain unseen in the cross-validation process, i.e. by the model's hyperparameters, otherwise you could be biasing the results obtained from the model by adding knowledge from the test sample.
I get the cross validation Mean Absolute Error (MAE) for the (X_train, y_train), right?
Yes, the error displayed by cross_val_score
will be only from the training data.
So the idea is that once you are satisfied with the results of cross_val_score
, you fit the final model with the whole training set, and perform a prediction on y_test
. For that you could use sklearn.metrics
. For isntance, if you wanted to obtain the MAE:
from sklearn.metrics import mean_absolute_error as mae
MAE = mae(y_test, y_pred)
Answered By - yatu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.