Issue
I am trying to create a subset of my data according to certain terms in the text column of my DataFrame.
df = pd.DataFrame({'id': [123, 456, 789, 101, 402],
'text': [[{'the meeting was amazing'}, {'we should do it more often'}],
[{'start': '15', 'tag': 'Meeting'}],
[],
[{'Let this be the end of it'}],
[{'end': '164', 'tag': 'meetingno2'}]
]
})
I want to get a subset with rows 1, 2, and 5 where the term 'meeting' appears in some form.
I have tried the following code:
df_sub = df[df['text'].isin(df['text'].str.findall(r'[Mm]eeting+'))]
But the resulting subset I get with this code only contains the rows where the text column is empty. However, when I try doing
df['text_2'] = df['text'].str.findall(r'[Mm]eeting+'))
--it produces a new column in the df with the value 'meeting' for rows 1, 2, and 5. Therefore, I think it is picking up the text but not splitting it correctly. How can I get the desired output?
Solution
The "in some form" is ambiguous, but one quick hack could be to convert to string and test if it contains the value. This this will match anything (dictionary keys, values, set values, etc.) as long as it is present in the string representation of the python object:
df[df['text'].astype(str).str.contains('meeting', case=False)]
output:
id text
0 123 [{the meeting was amazing}, {we should do it m...
1 456 [{'start': '15', 'tag': 'Meeting'}]
4 402 [{'end': '164', 'tag': 'meetingno2'}]
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.