Sunday, November 13, 2022

[FIXED] Using OrdinalEncoder to transform categorical values

November 13, 2022 python, scikit-learn No comments

Issue

I have a dataset that has many columns

No  Name  Sex  Blood  Grade  Height  Study
1   Tom   M    O      56     160     Math
2   Harry M    A      76     192     Math
3   John  M    A      45     178     English
4   Nancy F    B      78     157     Biology
5   Mike  M    O      79     167     Math
6   Kate  F    AB     66     156     English
7   Mary  F    O      99     166     Science

I want to change it to be something like that

No  Name  Sex  Blood  Grade  Height  Study
1   Tom   0    0      56     160     0
2   Harry 0    1      76     192     0
3   John  0    1      45     178     1
4   Nancy 1    2      78     157     2
5   Mike  0    0      79     167     0
6   Kate  1    3      66     156     1
7   Mary  0    0      99     166     3

I know there is a libabrary can do that which is

from sklearn.preprocessing import OrdinalEncoder

I tried this but it did not work

enc = OrdinalEncoder()
enc.fit(df[["Sex","Blood", "Study"]])

can anyone help me finding what i am doing wrong and how to that?

Thanks

Solution

You were almost there !

Basically the fit method, prepare the encoder (fit on your data i.e. prepare the mapping) but don't transform the data.

You have to call transform to transform the data , or use fit_transform which fit and transform the same data.

enc = OrdinalEncoder()
enc.fit(df[["Sex","Blood", "Study"]])
df[["Sex","Blood", "Study"]] = enc.transform(df[["Sex","Blood", "Study"]])

or directly

enc = OrdinalEncoder()
df[["Sex","Blood", "Study"]] = enc.fit_transform(df[["Sex","Blood", "Study"]])

Note: The values won't be the one that you provided, since internally the fit method use numpy.unique which gives result sorted in alphabetic order and not by order of appearance.

As you can see from enc.categories_

[array(['F', 'M'], dtype=object),
 array(['A', 'AB', 'B', 'O'], dtype=object),
 array(['Biology', 'English', 'Math', 'Science'], dtype=object)]```

Each value in the array is encoded by it's position. (F will be encoded as 0 , M as 1)

Answered By - abcdaire

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 13, 2022

[FIXED] Using OrdinalEncoder to transform categorical values

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels