Issue
I have the following code:
most_important = features_importance_chi(importance_score_tresh,
df_user.drop(columns = 'CHURN'),churn)
X = df_user.drop(columns = 'CHURN')
churn[churn==2] = 1
y = churn
# handle undersample problem
X,y = handle_undersampe(X,y)
# train the model
X=X.loc[:,X.columns.isin(most_important)].values
y=y.values
parameters = {
'application': 'binary',
'objective': 'binary',
'metric': 'auc',
'is_unbalance': 'true',
'boosting': 'gbdt',
'num_leaves': 31,
'feature_fraction': 0.5,
'bagging_fraction': 0.5,
'bagging_freq': 20,
'learning_rate': 0.05,
'verbose': 0
}
# split data
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
train_data = lightgbm.Dataset(x_train, label=y_train)
test_data = lightgbm.Dataset(x_test, label=y_test)
model = lightgbm.train(parameters,
train_data,
valid_sets=[train_data, test_data],
**feature_name=most_important,**
num_boost_round=5000,
early_stopping_rounds=100)
and function which returns most_important parameter
def features_importance_chi(importance_score_tresh, X, Y):
model = ExtraTreesClassifier(n_estimators=10)
model.fit(X,Y.values.ravel())
feature_list = pd.Series(model.feature_importances_,
index=X.columns)
feature_list = feature_list[feature_list > importance_score_tresh]
feature_list = feature_list.index.values.tolist()
return feature_list
Funny thing is that this code in Spyder returns the following error
LightGBMError: Do not support special JSON characters in feature name.
but in jupyter works fine. I am able to print the list of most important features.
Any idea what could be the reason for this error?
Solution
You know what, this message is often found on LGBMClassifier () models, i.e. LGBM. Simply drop this line at the beginning as soon as you upload the data from the pandas and you have a problem with your head:
import re
df = df.rename(columns = lambda x:re.sub('[^A-Za-z0-9_]+', '', x))
Answered By - Wojciech MoszczyĆski
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.