Issue
I want to write the code word_dict
, by calling the column name more_clean
on the dataframe, but the error expected string or bytes-like object appears.
This is my dataframe:
And this is my code:
word_dict = {}
for i in range(0,len(df['more_clean'])):
sentence = df['more_clean'][i]
word_token = word_tokenize(sentence)
for j in word_token:
if j not in word_dict:
word_dict[j] = 1
else:
word_dict[j] += 1
and an error message appears like this
TypeError: expected string or bytes-like object
Solution
You need to make sure the sentence
variable is of a str
type:
word_token = word_tokenize(str(sentence))
See the nltk.tokenize.word_tokenize
documentation:
nltk.tokenize.word_tokenize(text, language='english', preserve_line=False)
Parameters
text
(str) – text to split into wordslanguage
(str) – the model name in the Punkt corpuspreserve_line
(bool) – A flag to decide whether to sentence tokenize the text or not.
Answered By - Wiktor Stribiżew
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.