Issue
From my understanding from SKLearn's documentation, LabelEncoder
in SKLearn encodes values between 0 and the number of classes subtracted by 1 (i.e. n_classes
- 1).
I wanted to use something similar as a part of a Tensorflow preprocessing operation to avoid using SKLearn for a package. For example, I understand the preprocessing layer provides APIs for OneHot encoding and Categorical Encoding easily as follows:
tf.keras.layers.CategoryEncoding(
num_tokens=None, output_mode='multi_hot', sparse=False, **kwargs
)
Is there any way to use LabelEncoder by certain arguments in the CategoryEncoding
API, or do I have to define a brand new pre-processing layer using the abstract base class template provided in the Tensorflow documentations?
If so, is there any possible reference on how I can write my own class for using LabelEncoder
as a Tensorflow layer?
Solution
IIUC, you just need sparse integer labels. So, maybe try something simple and naive first:
classes = ['fish1', 'fish2', 'fish3']
data = ['fish1', 'fish2', 'fish3', 'fish2', 'fish3', 'fish1']
class_indices = dict(zip(classes, range(len(classes))))
labels = list(map(class_indices.get, data))
print(labels)
[0, 1, 2, 1, 2, 0]
Or with Tensorflow
, you can use StaticHashTable
:
import tensorflow as tf
classes = ['fish1', 'fish2', 'fish3']
data = tf.constant(['fish1', 'fish2', 'fish3', 'fish2', 'fish3', 'fish1'])
table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(tf.constant(classes), tf.range(len(classes))),
default_value=-1)
label_encoder = tf.keras.layers.Lambda(lambda x: table.lookup(x))
print(label_encoder(data))
tf.Tensor([0 1 2 1 2 0], shape=(6,), dtype=int32)
Answered By - AloneTogether
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.