Issue
I have a dataset that has many columns
No Name Sex Blood Grade Height Study
1 Tom M O 56 160 Math
2 Harry M A 76 192 Math
3 John M A 45 178 English
4 Nancy F B 78 157 Biology
5 Mike M O 79 167 Math
6 Kate F AB 66 156 English
7 Mary F O 99 166 Science
I want to change it to be something like that
No Name Sex Blood Grade Height Study
1 Tom 0 0 56 160 0
2 Harry 0 1 76 192 0
3 John 0 1 45 178 1
4 Nancy 1 2 78 157 2
5 Mike 0 0 79 167 0
6 Kate 1 3 66 156 1
7 Mary 0 0 99 166 3
I know there is a libabrary can do that which is
from sklearn.preprocessing import OrdinalEncoder
I tried this but it did not work
enc = OrdinalEncoder()
enc.fit(df[["Sex","Blood", "Study"]])
can anyone help me finding what i am doing wrong and how to that?
Thanks
Solution
You were almost there !
Basically the fit
method, prepare the encoder (fit on your data i.e. prepare the mapping) but don't transform the data.
You have to call transform
to transform the data , or use fit_transform
which fit and transform the same data.
enc = OrdinalEncoder()
enc.fit(df[["Sex","Blood", "Study"]])
df[["Sex","Blood", "Study"]] = enc.transform(df[["Sex","Blood", "Study"]])
or directly
enc = OrdinalEncoder()
df[["Sex","Blood", "Study"]] = enc.fit_transform(df[["Sex","Blood", "Study"]])
Note: The values won't be the one that you provided, since internally the fit method use numpy.unique
which gives result sorted in alphabetic order and not by order of appearance.
As you can see from enc.categories_
[array(['F', 'M'], dtype=object),
array(['A', 'AB', 'B', 'O'], dtype=object),
array(['Biology', 'English', 'Math', 'Science'], dtype=object)]```
Each value in the array is encoded by it's position. (F will be encoded as 0 , M as 1)
Answered By - abcdaire
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.