Issue
I am trying to select the best categorical features for a classification problem with chi2
and selectKBest
. Here, I've sorted out the categorical columns:
I separated the features and target like this and fit it to selectKBest
:
from sklearn.feature_selection import chi2, SelectKBest
X, y = df_cat_kbest.iloc[:, :-1], df_cat_kbest.iloc[:, -1]
selector = SelectKBest(score_func=chi2, k=3).fit_transform(X, y)
When I run it, I am getting the error:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13272\2211654466.py in <module>
----> 1 selector = SelectKBest(score_func=chi2, k=3).fit_transform(X, y)
E:\Anaconda\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
853 else:
854 # fit method of arity 2 (supervised transformation)
--> 855 return self.fit(X, y, **fit_params).transform(X)
856
857
...
...
E:\Anaconda\lib\site-packages\pandas\core\generic.py in __array__(self, dtype)
1991
1992 def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
-> 1993 return np.asarray(self._values, dtype=dtype)
1994
1995 def __array_wrap__(
ValueError: could not convert string to float: 'Self_emp_not_inc'
As far as I know, I can apply chi-square on categorical columns. Here, all the features are categorical, also the target. Then why is it saying that 'it can't convert string to float'?
Solution
Encode features would do the job. For example
from sklearn.preprocessing import OneHotEncoder
from sklearn.feature_selection import chi2, SelectKBest
from sklearn.pipeline import make_pipeline
X, y = df_cat_kbest.iloc[:, :-1], df_cat_kbest.iloc[:, -1]
selector = make_pipe(OneHotEncoder(drop='first'),SelectKBest(score_func=chi2, k=3)).fit_transform(X, y)
We have added a pre-processor! One-hot encoding. You can choose other encoding. The bottom line is that you need to transform your objects to numerical data ;)
There are other contributors encoders from contrib.scikit-category_encoders that might be helpful to your need
Answered By - Prayson W. Daniel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.