Tuesday, July 12, 2022

[FIXED] LGBM not varying predictions with random state

July 12, 2022 lightgbm, machine-learning, python, scikit-learn No comments

Issue

I am trying to compute prediction intervals for a classifier.

I trained in sklearn. Even after setting a new random_state parameter in my pipeline, it doesn't seem to change my results when refitting on the data. What can I do about this?

This is a relevant snippet of the code I'm using:

SEED_VALUE = 3
t_clf = Pipeline(steps=[('preprocessor', preprocessor), ('lgbm',
                        LGBMClassifier(class_weight="balanced",
                        random_state=SEED_VALUE, max_depth=20,
                        min_child_samples=20, num_leaves=31))
                        ])
states = [0,1,2,3]

for state in states:   
    train_temp = train.copy()
    t_clf.set_params(lgbm__random_state=state)
    t_clf.fit(train_temp, train_temp['label'])
    t_clf.predict_proba(test)   

# output from predict probability doesn't change with varying states

The same occurs when trying to change shuffle order or bagging seed.

Here are my current parameters if this is helpful to know:

LGBMClassifier(bagging_seed=2, boosting_type='gbdt', class_weight='balanced',
               colsample_bytree=1.0, importance_type='split', learning_rate=0.1,
               max_depth=50, min_child_samples=1, min_child_weight=0.001,
               min_data_in_leaf=10, min_split_gain=0.0, n_estimators=100,
               n_jobs=-1, num_leaves=30, objective=None, random_state=1,
               reg_alpha=0.0, reg_lambda=0.0, silent=True, subsample=1.0,
               subsample_for_bin=200000, subsample_freq=0)

Solution

The reason why you get the same results regardless of the random seed is because no random sampling is performed at any stage with your model specification. If for instance you set colsample_bytree to a value less than 1 then you will see different predicted probabilities for different random seeds.

from sklearn.datasets import make_classification
from lightgbm import LGBMClassifier

# generate some data
X, y = make_classification(n_samples=1000, n_features=50, random_state=100)

# set the random state
for state in [0, 1, 2, 3]:

    # instantiate the classifier
    clf = LGBMClassifier(
        class_weight='balanced',
        max_depth=20,
        min_child_samples=20,
        num_leaves=31,
        random_state=state,
        colsample_bytree=0.1,
    )

    # fit the classifier
    clf.fit(X, y)

    # predict the class probabilities
    y_pred = clf.predict_proba(X)

    # print the predicted probability of the 
    # first class for the first sample 
    print([state, format(y_pred[0, 0], '.4%')])

    # [0, '97.8132%']
    # [1, '97.4980%']
    # [2, '98.3729%']
    # [3, '98.0737%']

Answered By - Flavia Giammarino

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, July 12, 2022

[FIXED] LGBM not varying predictions with random state

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels