Issue
I am trying different machine learning projects from Kaggle to make myself better. Here is the model that I am using:
from sklearn.ensemble import RandomForestClassifier
y = train_data["Survived"]
features = ["Pclass", "Sex", "SibSp", "Parch"]
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])
model = RandomForestClassifier(n_estimators = 100, max_depth = 5, random_state = 1)
model.fit = (X, y)
predictions = model.predict(X_test)
output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
output.to_csv('submission.csv', index = False)
print('Your submission was successfully saved!')
Here is the error I get:
---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
/tmp/ipykernel_33/1528591149.py in <module>
9 forest_clf = RandomForestClassifier(n_estimators = 100, max_depth = 5, random_state = 1)
10 forest_clf.fit = (X, y)
---> 11 predictions = forest_clf.predict(X_test)
12
13 output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in predict(self, X)
806 The predicted classes.
807 """
--> 808 proba = self.predict_proba(X)
809
810 if self.n_outputs_ == 1:
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in predict_proba(self, X)
846 classes corresponds to that in the attribute :term:`classes_`.
847 """
--> 848 check_is_fitted(self)
849 # Check data
850 X = self._validate_X_predict(X)
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
1220
1221 if not fitted:
-> 1222 raise NotFittedError(msg % {"name": type(estimator).__name__})
1223
1224
NotFittedError: This RandomForestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
I think this is an example of the estimator cloning itself, but I am not sure which line is the issue here. This is the Titanic project that is seen on Kaggle, whose tutorial code I have copied amidst trying to learn. Any help is appreciated.
Solution
As @Blackgaurd pointed out just change model.fit = (X, y)
to model.fit(X, y)
Your current code overwrites the fit
method of your Random Forest Classifier.
Full code of yours with correction:
from sklearn.ensemble import RandomForestClassifier
y = train_data["Survived"]
features = ["Pclass", "Sex", "SibSp", "Parch"]
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])
model = RandomForestClassifier(n_estimators = 100, max_depth = 5, random_state = 1)
model.fit(X, y) # <- line of code fixed
predictions = model.predict(X_test)
output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
output.to_csv('submission.csv', index = False)
print('Your submission was successfully saved!')
Answered By - petezurich
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.