Issue
Is there a way to do hyper parameter tuning with the use of Gridsearch without defining each param (parameters) On a classifier/regressor? Like a auto hyper parameter tuning command. on documentation I found ParameterGrid but I did not fully understand what this is for.
Solution
In scikit-learn, you need to define both:
- which hyperparameter you want to tune
- which values of distributions you want to test for each hyperparameter
This is defined with a dictionary like param_grid = {'C': [1, 10], 'kernel': ['linear', 'rbf]}
where the keys are the hyperparameter to be tuned, and the values are a list of values to be tested.
When you give this dictionary to GridSearchCV
, it automatically creates a grid of hyperparameter with all possible combinations, using ParameterGrid
. For example:
from sklearn.model_selection import ParameterGrid
param_grid = {'C': [1, 10], 'kernel': ['linear', 'rbf']}
list(ParameterGrid(param_grid)) == (
[{'C': 1, 'kernel': 'linear'}, {'C': 1, 'kernel': 'rbf'},
{'C': 10, 'kernel': 'linear'}, {'C': 10, 'kernel': 'rbf'}])
This is the list of combinations of hyperparameters that are tested in the grid search.
See also this example about how to use GridSearchCV
, or the Automatic parameter searches section of the excellent Getting-started guide.
If you don't want to define yourself which hyperparameter to tune, or which values to test, you need an external definition of reasonable hyperparameter to tune that would work for any dataset. For example, you can take a look at "auto-ML" packages:
auto-sklearn An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
autoviml Automatically Build Multiple Machine Learning Models with a Single Line of Code. Designed as a faster way to use scikit-learn models without having to preprocess data.
TPOT An automated machine learning toolkit that optimizes a series of scikit-learn operators to design a machine learning pipeline, including data and feature preprocessors as well as the estimators. Works as a drop-in replacement for a scikit-learn estimator.
Featuretools A framework to perform automated feature engineering. It can be used for transforming temporal and relational datasets into feature matrices for machine learning.
Neuraxle A library for building neat pipelines, providing the right abstractions to both ease research, development, and deployment of machine learning applications. Compatible with deep learning frameworks and scikit-learn API, it can stream minibatches, use data checkpoints, build funky pipelines, and serialize models with custom per-step savers.
EvalML EvalML is an AutoML library which builds, optimizes, and evaluates machine learning pipelines using domain-specific objective functions. It incorporates multiple modeling libraries under one API, and the objects that EvalML creates use an sklearn-compatible API.
Answered By - TomDLT
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.