Issue
I am trying to find all occurances of a sub-string using regular expression. The sub-string is composed of three parts, starts with one or more 'A', followed by one or more 'N' and ended with one or more 'A'. Let a string 'AAANAANABNA' and if I parse the string I should get two sub-strings 'AAANAA' and 'AANA' as the output. So, I have tried the below code.
import regex as re
reg_a='A+N+A+'
s='AAANAANABNA'
sub_str=re.findall(reg_a,s,overlapped=True)
print(sub_str)
And, I am getting the below output,
['AAANAA', 'AANAA', 'ANAA', 'AANA', 'ANA']
But, I want the output as,
['AAANAA', 'AANA']
That is, the trailing A's of the first match should be the leading A's of the next match. How can I get that, any idea?
Solution
Make sure there are no A
on the left:
>>> reg_a='(?<!A)A+N+A+'
>>> print( re.findall(reg_a,s,overlapped=True) )
['AAANAA', 'AANA']
The (?<!A)A+N+A+
matches
(?<!A)
- a negative lookbehind that matches a location that is not immediately preceded withA
A+
- one or moreA
sN+
- one or moreN
sA+
- one or moreA
s
Note you may use re
to get the matches, too:
>>> import re
>>> re_a = r'(?=(?<!A)(A+N+A+))'
>>> print( re.findall(re_a, s) )
['AAANAA', 'AANA']
Answered By - Wiktor Stribiżew
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.