Issue
I want to extract and locate the words within all brackets/braces in a sentence, but I am currently having trouble with overlapping brackets. e.g.:
[in]: sentence = '{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb'
[in]: pattern = r"\[([^\[\]()]+?)\]|\(([^\[\]()]+?)\)|\{([^\[\]()]+?)\}"
[in]: [(m.start(0), m.end(0), sentence[m.start(0) : m.end(0)]) for m in re.finditer(pattern, sentence)]
[out]: [(0, 4, '{ia}'), (5, 27, '({fascia} antebrachii)')]
It should identify 3 instances and correct indices. Any advice pls?
Solution
Try using the regex module. It can deal with overlapped strings:
import regex as re
sentence = '{ia} ({fascia} antebrachii). Genom att aponeurosen fäster i armb'
pattern = '{[^{}]+}|\[[^\[\]]+\]|\([^\(\)]+\)'
[(m.start(0), m.end(0), sentence[m.start(0) : m.end(0)]) for m in re.finditer(pattern, sentence, overlapped=True)]
There's also a simplified regex pattern, that matches...
- everything that is not a brace among braces:
{[^{}]+}
, - everything that is not a bracket among brackets:
\[[^\[\]]+\]
- everything that is not a parenthesis among parentheses:
\([^\(\)]+\)
Answered By - lemon
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.