Issue
I have the following dataframe:
df = pd.DataFrame({'column1': ['Severe weather Not Severe weather kind of severe weather]})
I tokenized this dataframe:
from nltk.tokenize import word_tokenize
df['column1'] = df['column1'].apply(lambda x: word_tokenize(x))
The output is enclosed inside brackets:
column1
0 [Severe, weather, Not, Severe, weather, kind, of, severe, weather]
I want the have the output without brackets:
column1
0 Severe, weather, Not, Severe, weather, kind, of, severe, weather
What I have tried:
def delete_brackets(x):
for i in x:
if i == '[' or i == ']':
x.remove(i)
return x
df=delete_brackets(df)
and
def remove_brackets(x):
return x.replace('[', '').replace(']', '')
df=remove_brackets(df)
Still getting the output inside brackets
Any ideas? Thanks
Solution
You can use
df['column1'] = df['column1'].apply(lambda x: ", ".join(map(str, word_tokenize(x))))
Output:
>>> print(df.to_string())
column1
0 Severe, weather, Not, Severe, weather, kind, of, severe, weather
The word_tokenize()
function returns a list of tokens that you need to cast to str
(this is done with map(str, word_tokenize(x))
) and then you can join the strings with a comma and space.
Answered By - Wiktor Stribiżew
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.