Issue
Im trying to train a lightGBM model on a dataset consisting of numerical, Categorical and Textual data. However, during the training phase, i get the following error:
params = {
'num_class':5,
'max_depth':8,
'num_leaves':200,
'learning_rate': 0.05,
'n_estimators':500
}
clf = LGBMClassifier(params)
data_processor = ColumnTransformer([
('numerical_processing', numerical_processor, numerical_features),
('categorical_processing', categorical_processor, categorical_features),
('text_processing_0', text_processor_1, text_features[0]),
('text_processing_1', text_processor_1, text_features[1])
])
pipeline = Pipeline([
('data_processing', data_processor),
('lgbm', clf)
])
pipeline.fit(X_train, y_train)
and the error is:
TypeError: Unknown type of parameter:boosting_type, got:dict
I basically have two textual features, both are some form of names on which im performing stemming mainly .
Any pointers would be highly appreciated.
Solution
You are setting up the classifier wrongly, this is giving you the error and you can easily try this before going to the pipeline:
params = {
'num_class':5,
'max_depth':8,
'num_leaves':200,
'learning_rate': 0.05,
'n_estimators':500
}
clf = LGBMClassifier(params)
clf.fit(np.random.uniform(0,1,(50,2)),np.random.randint(0,5,50))
Gives you the same error:
TypeError: Unknown type of parameter:boosting_type, got:dict
You can set up the classifier like this:
clf = LGBMClassifier(**params)
Then using an example, you can see it runs:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
numerical_processor = StandardScaler()
categorical_processor = OneHotEncoder()
numerical_features = ['A']
categorical_features = ['B']
data_processor = ColumnTransformer([('numerical_processing', numerical_processor, numerical_features),
('categorical_processing', categorical_processor, categorical_features)])
X_train = pd.DataFrame({'A':np.random.uniform(100),
'B':np.random.choice(['j','k'],100)})
y_train = np.random.randint(0,5,100)
pipeline = Pipeline([('data_processing', data_processor),('lgbm', clf)])
pipeline.fit(X_train, y_train)
Answered By - StupidWolf
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.