Issue
I would like to make supervised learning.
Until now I know to do supervised learning to all features.
However, I would like also to conduct experiment with the K best features.
I read the documentation and found the in Scikit learn there is SelectKBest method.
Unfortunately, I am not sure how to create new dataframe after finding those best features:
Let's assume I would like to conduct experiment with 5 best features:
from sklearn.feature_selection import SelectKBest, f_classif
select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class)
Now if I would add the next line:
dataframe = pd.DataFrame(select_k_best_classifier)
I will receive a new dataframe without feature names (only index starting from 0 to 4).
I should replace it to:
dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names)
My question is how to create the features_names list??
I know that I should use:
select_k_best_classifier.get_support()
Which returns array of boolean values.
The true value in the array represent the index in the right column.
How should I use this boolean array with the array of all features names I can get via the method:
feature_names = list(features_dataframe.columns.values)
Solution
You can do the following :
mask = select_k_best_classifier.get_support() #list of booleans
new_features = [] # The list of your K best features
for bool, feature in zip(mask, feature_names):
if bool:
new_features.append(feature)
Then change the name of your features:
dataframe = pd.DataFrame(fit_transofrmed_features, columns=new_features)
Answered By - MMF
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.