Issue
I got a list which contains approx. 10.000 strings and I want to use a regex pattern to detect this in this list. When I use re.compile it takes a lot of time to only apply one regex pattern. Is there any way with Python to make it faster?
Here my code:
import re
list_of_strings = ["I like to eat meat", "I don't like to eat meat", "I like to eat fish", "I don't like to eat fish"]
outcome = [x for x in list_of_strings if len(re.compile(r"I like to eat (.*?)").findall(x)) != 0]
Out[6]: ['I like to eat meat', 'I like to eat fish']
Here I have just 4 strings to demonstrate the case. In reality the code should handle 10.000 strings.
I could also use multiple processing to solve this issue but maybe there is also another solution with pytorch, pyspark or other Frameworks existing.
[Edit] Thanks for all answers. I should have mentioned that every string is an article. So, it is not just one sentence to be handled from regex.
I also want to say that the regex here ist not that problem. So this is not a topic to be discussed.
Solution
You may also consider looping the list.
new_list = []
for item in list_of_strings:
if 'I like to eat' in item:
new_list.append(item)
Answered By - user99999
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.