Saturday, February 5, 2022

[FIXED] ValueError: could not convert string to float: 'Mme'

February 05, 2022 kaggle, machine-learning, pandas, python-3.x, scikit-learn No comments

Issue

When I run the following code in Jupyter Lab

import numpy as np
from sklearn.feature_selection import SelectKBest,f_classif
import matplotlib.pyplot as plt

predictors = ["Pclass","Sex","Age","SibSp","Parch","Fare","Embarked","FamilySize","Title","NameLength"]
selector = SelectKBest(f_classif,k=5)
selector.fit(titanic[predictors],titanic["Survived"])

Then it went errors and note that ValueError: could not convert string to float: 'Mme',details are like these:

  ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    C:\Users\ADMINI~1\AppData\Local\Temp/ipykernel_17760/1637555559.py in <module>
          5 predictors = ["Pclass","Sex","Age","SibSp","Parch","Fare","Embarked","FamilySize","Title","NameLength"]
          6 selector = SelectKBest(f_classif,k=5)
    ----> 7 selector.fit(titanic[predictors],titanic["Survived"])
     ......
    
    ValueError: could not convert string to float: 'Mme'

I tried to print titanic[predictors] and titanic["Survived"],then the details are follows:

    Pclass  Sex Age SibSp   Parch   Fare    Embarked    FamilySize  Title   NameLength
0   3   0   22.0    1   0   7.2500  0   1   1   23
1   1   1   38.0    1   0   71.2833 1   1   3   51
2   3   1   26.0    0   0   7.9250  0   0   2   22
3   1   1   35.0    1   0   53.1000 0   1   3   44
4   3   0   35.0    0   0   8.0500  0   0   1   24
... ... ... ... ... ... ... ... ... ... ...
886 2   0   27.0    0   0   13.0000 0   0   6   21
887 1   1   19.0    0   0   30.0000 0   0   2   28
888 3   1   28.0    1   2   23.4500 0   3   2   40
889 1   0   26.0    0   0   30.0000 1   0   1   21
890 3   0   32.0    0   0   7.7500  2   0   1   19
891 rows × 10 columns

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: Survived, Length: 891, dtype: int64

How to Solve this Problem?

Solution

When you are trying to fit some algorithm (in your case SelectKBest), you need to be aware of your data. And, almost all time you need to preprocess it.

Take a look to your data:

Do you have categorical features or they are numerical? Or a mix?
Do you have NaN values?
...

Most of algorithm don't accept categorical features, and you will need to make a transformation to numerical one (evaluate the use of OneHotEncoder).

In your case it seems you have a categorical value called Mme, which is in the feature Title. Check it.

You will have the same problem with NaN values.

In conclusion, before start fitting, you have to preprocess your data.

Answered By - Alex Serra Marrugat

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, February 5, 2022

[FIXED] ValueError: could not convert string to float: 'Mme'

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels