Issue
I've built a model using LogisticRegression()
and after a grid search the data suggests for my inverse of regularization strength, C = .0000001
is the "best" value to make my predictions.
This parameter works fine for LogisticRegression()
, but seeing as I want to cross-validate I decide to use LogisticRegressionCV()
the equivalent c
parameter here is denoted as Cs
, yet when I try to pass the same variable Cs = .0000001
, I get an error:
797 warm_start_sag = {"coef": np.expand_dims(w0, axis=1)}
799 coefs = list()
--> 800 n_iter = np.zeros(len(Cs), dtype=np.int32)
801 for i, C in enumerate(Cs):
802 if solver == "lbfgs":
TypeError: object of type 'float' has no len()
When referring to the documents it seems that for LogisticRegressionCV()
:
If Cs is as an int, then a grid of Cs values are chosen in a logarithmic scale between 1e-4 and 1e4.
How would I then still input a value of Cs = .0000001? I'm confused about how to proceed.
Solution
LogisticRegressionCV
is not meant to be just cross-validation-scored logistic regression; it is a hyperparameter-tuned (by cross-validation) logistic regression. That is, it tries several different regularization strengths, and selects the best one using cross-validation scores (then refits a single model on the entire training set, using that best C
). Cs
can be a list of values to try for C
, or an integer to let sklearn create a list for you (as in your quoted doc).
If you just want to score your model with fixed C
, use cross_val_score
or cross_validate
.
(You probably can use LogisticRegressionCV
, setting Cs=[0.0000001]
, but it's not the right semantic usage.)
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.