Issue
I am trying to build model and create the grid search and below is the code. Raw data is downloaded from this site(credit card fraud data). https://www.kaggle.com/mlg-ulb/creditcardfraud
Code starting from standardization after reading the data.
standardization = StandardScaler()
credit_card_fraud_df[['Amount']] = standardization.fit_transform(credit_card_fraud_df[['Amount']])
# Assigning feature variable to X
X = credit_card_fraud_df.drop(['Class'], axis=1)
# Assigning response variable to y
y = credit_card_fraud_df['Class']
# Splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, test_size=0.3, random_state=100)
X_train.head()
power_transformer = PowerTransformer(copy=False)
power_transformer.fit(X_train) ## Fit the PT on training data
X_train_pt_df = power_transformer.transform(X_train) ## Then apply on all data
X_test_pt_df = power_transformer.transform(X_test)
y_train_pt_df = y_train
y_test_pt_df = y_test
train_pt_df = pd.DataFrame(data=X_train_pt_df, columns=X_train.columns.tolist())
# set up cross validation scheme
folds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 4)
# specify range of hyperparameters
params = {"C":np.logspace(-3,3,5,7), "penalty":["l1","l2"]}# l1 lasso l2 ridge
## using Logistic regression for class imbalance
model = LogisticRegression(class_weight='balanced')
grid_search_cv = GridSearchCV(estimator = model, param_grid = params,
scoring= 'roc_auc',
cv = folds,
return_train_score=True, verbose = 1)
grid_search_cv.fit(X_train_pt_df, y_train_pt_df)
## reviewing the results
cv_results = pd.DataFrame(grid_search_cv.cv_results_)
cv_results
Sample Result:
mean_fit_time std_fit_time mean_score_time std_score_time param_C param_penalty params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 0.044332 0.002040 0.000000 0.000000 0.001 l1 {'C': 0.001, 'penalty': 'l1'} NaN NaN NaN NaN NaN NaN NaN 6
1 0.477965 0.046651 0.016745 0.003813 0.001 l2 {'C': 0.001, 'penalty': 'l2'} 0.485714 0.428571 0.542857 0.485714 0.457143 0.480000 0.037904 5
I do not have any null values in the input data.I am not understanding why am i getting Nan values for these columns. Can anyone please help me?
Solution
You have a problem with default solver defined here:
model = LogisticRegression(class_weight='balanced')
which follows from the following error message:
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.
Also, it might be useful to study docs prior to defining a param grid:
penalty: {‘l1’, ‘l2’, ‘elasticnet’, ‘none’}, default=’l2’ Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver. If ‘none’ (not supported by the liblinear solver), no regularization is applied.
Al soon as you correct it with a different solver that supports your desired grid, you're fine to go:
## using Logistic regression for class imbalance
model = LogisticRegression(class_weight='balanced', solver='saga')
grid_search_cv = GridSearchCV(estimator = model, param_grid = params,
scoring= 'roc_auc',
cv = folds,
return_train_score=True, verbose = 1)
grid_search_cv.fit(X_train_pt_df, y_train_pt_df)
## reviewing the results
cv_results = pd.DataFrame(grid_search_cv.cv_results_)
Note as well a ConvergenceWarning
which might suggest you need to increase default max_iter
, tol
, or switch to another solver and rethink the desired param grid.
Answered By - Sergey Bushmanov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.