Issue
I want to generate generate n-grams from a sequence of tokens:
bigram:: "1 3 4 5" --> { (1,3), (3,4), (4,5) }
After searching I found this thread that used:
def find_ngrams(input_list, n):
return zip(*[input_list[i:] for i in range(n)])
If I use this piece of code during my training time I think it kills the performance. So I looking for a better option.
Solution
If you need to generate bigram in string format:
import tensorflow as tf
tf.enable_eager_execution()
sentence = ['this is example sentence']
x = tf.string_split(sentence).values[:-1] + ' ' + tf.string_split(sentence).values[1:]
# tf.Tensor([b'this is' b'is example' b'example sentence'], shape=(3,), dtype=string)
You can also use tensorflow-transform
to generate ngrams.
import tensorflow_transform as tft
tft.ngrams(tensor, (1,2), " ")
Note: tensorflow-transform only supports python 2 until 22 January 2019.
Answered By - Amir
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.