Issue
I have a dataframe with ID
and TEXT
field. I want to create another dataframe splitting the sentences in TEXT
field by the dot
and keeping the original ID
So the phrase: "I loves cats. I hate snakes" becomes two sentences in 2 rows
in the new dataframe:
0 `I love cats`
0 `I hate snakes`
Original Dataframe:
ID TEXT
1 This is a msg. Another msg
2 The weather is hot, the water is cold. My hands are freezing
Transformed Dataframe:
ID
1 This is a msg
1 Another msg
2 The weather is hot, the water is cold
2 My hands are freezing
the code to build the dataframe:
df = pd.DataFrame({'ID':[1,2], 'TEXT':['This is a msg. Another msg', 'The weather is hot, the water is cold. My hands are freezing']})
I am trying to use split
-> df['TEXT'].astype(str).split('.')
but I keep getting errors because series objects has no split method.
Solution
You also need to set ID
as index beforehand so that the exploded rows will have the respective IDs
df.set_index('ID', inplace=True)
split = df['TEXT'].str.split('.').explode()
Answered By - Nuri Taş
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.