Issue
I have a dataset like:
e = pd.DataFrame({
'col1': ['A', 'A', 'B', 'W', 'F', 'C'],
'col2': [2, 1, 9, 8, 7, 4],
'col3': [0, 1, 9, 4, 2, 3],
'col4': ['a', 'B', 'c', 'D', 'e', 'F']
})
Here I encoded the data using sklearn.preprocessing.LabelEncoder
. By the following lines of code:
x = list(e.columns)
# Import label encoder
from sklearn import preprocessing
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
for i in x:
# Encode labels in column 'species'.
e[i] = label_encoder.fit_transform(e[i])
print(e)
But this is encoding even the numeric datapoint of int
type, which is not required.
Encoded dataset :
col1 col2 col3 col4
0 0 1 0 3
1 0 0 1 0
2 1 5 5 4
3 4 4 4 1
4 3 3 2 5
5 2 2 3 2
How can I rectify this?
Solution
One really simple possibility would be to only encode columns with string values. E.g., tweaking your code to be:
import pandas as pd
from sklearn import preprocessing
e = pd.DataFrame({
'col1': ['A', 'A', 'B', 'W', 'F', 'C'],
'col2': [2, 1, 9, 8, 7, 4],
'col3': [0, 1, 9, 4, 2, 3],
'col4': ['a', 'B', 'c', 'D', 'e', 'F']
})
label_encoder = preprocessing.LabelEncoder()
for col in e.columns:
if e[col].dtype == 'O':
e[col] = label_encoder.fit_transform(e[col])
print(e)
or better yet:
import pandas as pd
from sklearn import preprocessing
def encode_labels(ser):
if ser.dtype == 'O':
return label_encoder.fit_transform(ser)
else:
return ser
label_encoder = preprocessing.LabelEncoder()
e = pd.DataFrame({
'col1': ['A', 'A', 'B', 'W', 'F', 'C'],
'col2': [2, 1, 9, 8, 7, 4],
'col3': [0, 1, 9, 4, 2, 3],
'col4': ['a', 'B', 'c', 'D', 'e', 'F']
})
e_encoded = e.apply(encode_labels)
print(e_encoded)
Answered By - paxton4416
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.