Issue
I have a character column and numbers but I want to categorize the character column and apply a random forest classifier. I realize that there is OneHotEncoder but there is no example anywhere. So how can I categorize the characters e.g. a gender column which has 'f' and 'm' into integers like (0,1)?
Solution
Use LabelEncoder which takes an array of strings and transforms it into an array of integers.
Example:
from sklearn.preprocessing import LabelEncoder
import pandas as pd
data = pd.DataFrame()
data['age'] = [17,33,47]
data['gender'] = ['m','f','m']
enc = LabelEncoder()
print(data)
enc.fit(data['gender'])
data['gender'] = enc.transform(data['gender'])
print(data)
Output:
age gender
0 17 m
1 33 f
2 47 m
age gender
0 17 1
1 33 0
2 47 1
Answered By - Robin Spiess
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.