Tuesday, July 12, 2022

[FIXED] making pipeline for machine learning models

July 12, 2022 classification, machine-learning, pipeline, python, scikit-learn No comments

Issue

from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

model_params = {           /* creating dictionary of all classifiers with paramters */
     'svm': {
        'model': svm.SVC(gamma='auto'),
         'params' : {
             'svc__C': [1,10,100,1000],
             'svc__kernel': ['rbf','linear']
         }  
     },
    
        'logistic_regression' : {
         'model': LogisticRegression(solver='liblinear',multi_class='auto'),
         'params': {
             'logisticregression__C': [1,5,10]
         }
     },
    
    'random_forest1': {
         'model': RandomForestClassifier(),
         'params' : {
             'randomforestclassifier__n_estimators': [1,5,10]
         }
     },
    
       

      'decision_tree': {
         'model': DecisionTreeClassifier(),
         'params': {
             'decisionTreeClassifier__criterion': ["gini","entropy","log_loss"]
            
         }
    
       }
}
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

scores = []
best_estimators = {}
import pandas as pd
for algo, mp in model_params.items():
    pipe = make_pipeline(StandardScaler(), mp['model']) /* creating pipeline to scale data and fetching classifiers from dictionary */
    
    clf =  GridSearchCV(pipe, mp['params'], cv=5, return_train_score=False)  /* using grid search cv on my classifiers */
   
    clf.fit(features,target)
    scores.append({
        'model': algo,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })
    best_estimators[algo] = clf.best_estimator_
    
df = pd.DataFrame(scores,columns=['model','best_score','best_params'])

Error:

Invalid parameter '' for estimator Pipeline(steps=[('standardscaler', StandardScaler()),
                ('decision_tree', DecisionTreeClassifier() ]). Valid parameters are: ['memory', 'steps', 'verbose'].

the code works fine for svm logistic regression and random forest classifiers but throw parameter error for the decision tree classifier. cant figure out whether it is a syntax issue or something else

Solution

It should be decisiontreeclassifier__criterion. make_pipeline() sets the stage name to lowercase of a respective type (https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html)

Answered By - dx2-66

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, July 12, 2022

[FIXED] making pipeline for machine learning models

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels