Issue
I solve the problem from Stepik:
One tree is good, but where are the guarantees that it is the best, or at least close to it? One of the ways to find a more or less optimal set of tree parameters is to iterate over a set of trees with different parameters and choose the appropriate one. For this purpose, there is a GridSearchCV class that iterates over each of the combinations of parameters among those specified for the model, trains it on the data and performs cross-validation. After that, the model with the best parameters is stored in the .best_estimator_ attribute. Now the task is to iterate over all the trees on the iris data according to the following parameters: maximum depth - from 1 to 10 levels the minimum number of samples for separation is from 2 to 10 minimum number of samples per sheet - from 1 to 10 and store the best tree in the variable best_tree. Name the variable with GridSearchCV search. Here is my solution:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
parameters = {'max_depth': range(1, 10), 'min_samples_split': range(2, 10), 'min_samples_leaf': range(1, 10)}
search = GridSearchCV(iris, parameters)
search.fit(X, y)
best_tree = search.estimator
Why am I getting this error?:
Traceback (most recent call last):
File "jailed_code", line 22, in <module>
search.fit(X, y)
File "/home/stepic/instances/master-plugins/sandbox/python3/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 595, in fit
self.estimator, scoring=self.scoring)
File "/home/stepic/instances/master-plugins/sandbox/python3/lib/python3.6/site-packages/sklearn/metrics/scorer.py", line 342, in _check_multimetric_scoring
scorers = {"score": check_scoring(estimator, scoring=scoring)}
File "/home/stepic/instances/master-plugins/sandbox/python3/lib/python3.6/site-packages/sklearn/metrics/scorer.py", line 274, in check_scoring
"'fit' method, %r was passed" % estimator)
TypeError: estimator should be an estimator implementing 'fit' method, {'data': array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
...
Solution
You passed the dataset instead of an estimator. If you haven't already, take a look at this https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
This should work
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
parameters = {'max_depth': range(1, 10), 'min_samples_split': range(2, 10), 'min_samples_leaf': range(1, 10)}
search = GridSearchCV(estimator=DecisionTreeClassifier(),
param_grid=parameters)
search.fit(X, y)
search.cv_results_
Answered By - Braden Anderson
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.