Issue
im trying to save a dict of LE encoders for use in inferencing, this is the code that trains and applies the LE and then saves the LE into dict (label_object) which then will be joblib.dump(ed)()
for col in data:
if data[col].dtype == 'object':
# If 2 or fewer unique categories
if len(list(data[col].unique())) >= 2:
# Train on the training data
le.fit(data[col])
label_object[col] = le
# Transform both training and testing data
data[col] = le.transform(data[col])
label_object[col] = le
When trying this it seems the classes_ of the LE get overwritten by the last LE, in this case 'day_of_incident'
Im not sure whats causing this issues, is there an issue with the logic of the code or am I doing something wrong?
Solution
I suggest to avoid memory id()
issues, generating a new instance of Label Encoder per iteration as well. Also you can 1 line both and disregard the need to transform data[col].unique()
output into a list to evaluate if len() >= 2
conditions:
for col in data:
if (data[col].dtype == 'object') & (len(data[col].unique()) >=2:
le = LabelEncoder()
le.fit(data[col])
label_object[col] = le
# Transform both training and testing data
data[col] = le.transform(data[col])
label_object[col] = le
Answered By - Celius Stingher
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.