Issue
This is my minimal reproducible example:
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_validate
x = np.array([
[1, 2],
[3, 4],
[5, 6],
[6, 7]
])
y = [1, 0, 0, 1]
model = GaussianNB()
scores = cross_validate(model, x, y, cv=2, scoring=("accuracy"))
model.predict([8,9])
What I intended to do is instantiating a Gaussian Naive Bayes Classifier and use sklearn.model_selection.cross_validate for cross validate my model (I am using cross_validate
instead of cross_val_score
since in my real project I need precision, recall and f1 as well).
I have read in the doc that cross_validate
does "evaluate metric(s) by cross-validation and also record fit/score times."
I expected that my model
would have been fitted on x
(features), y
(labels) data but when I invoke model.predict(.)
I get:
sklearn.exceptions.NotFittedError: This GaussianNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
Of course it says me about invoking model.fit(x,y)
before "using the estimator" (that is before invoking model.predict(.)
.
Shouldn't the model have been fitted cv=2
times when I invoke cross_validate(...)
?
Solution
A close look at cross_validate
documentation reveals that it includes an argument:
return_estimator : bool, default=False
Whether to return the estimators fitted on each split.
So, by default it will not return any fitted estimator (hence it cannot be used to predict
).
In order to predict with the fitted estimator(s), you need to set the argument to True
; but beware, you will not get a single fitted model, but a number of models equal to your cv
parameter value (here 2):
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_validate
x = np.array([
[1, 2],
[3, 4],
[5, 6],
[6, 7]
])
y = [1, 0, 0, 1]
model = GaussianNB()
scores = cross_validate(model, x, y, cv=2, scoring=("accuracy"), return_estimator=True)
scores
# result:
{'fit_time': array([0.00124454, 0.00095725]),
'score_time': array([0.00090432, 0.00054836]),
'estimator': [GaussianNB(), GaussianNB()],
'test_score': array([0.5, 0.5])}
So, in order to get predictions from each fitted model, you need:
scores['estimator'][0].predict([[8,9]])
# array([1])
scores['estimator'][1].predict([[8,9]])
# array([0])
This may look inconvenient, but it is like that by design: cross_validate
is generally meant only to return the scores necessary for diagnosis and assessment, not to be used for fitting models which are to be used for predictions.
Answered By - desertnaut
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.