Issue
I'm struggling to predict how many fuzzy matches findall()
will return when using regex
in Python with BESTMATCH
enabled:
>>> regex.findall(r'(?b)(North\ West){i<=0,s<=2,d<=1}', "South west South West North West", regex.V1)
['North West']
Does not match South West
at all
>>> regex.findall(r'(?b)(North\ West){i<=0,s<=2,d<=1}', "South west South West North West North west South West", regex.V1)
['North West', 'North west', 'South West']
Matches South West
I'm not clear if this is a bug or as intended?
Solution
I think I have a partial explanation:
The behaviour appears to be:
- Return all perfect matches (with no substitutions, insertions or deletions)
- Return all imperfect matches AFTER the last perfect match
It appears (through testing), that BESTMATCH
behaves like a normal search, except if it finds a perfect match, it throws away any preceding imperfect matches. Hence the behaviour that it returns a series of perfect matches, followed by zero or more imperfect matches.
Some examples:
>>> regex.findall(r'(?b)(abc){i<=0,s<=2,d<=1}', "abc abd abd aaa abc", regex.V1)
['abc', 'abc']
>>> regex.findall(r'(?b)(abc){i<=0,s<=2,d<=1}', "abc abd abd aaa abc abb", regex.V1)
['abc', 'abc', 'abb']
>>> regex.findall(r'(?b)(ab[cd]){i<=0,s<=2,d<=1}', "abc abd abd aaa abc abb", regex.V1)
['abc', 'abd', 'abd', 'abc', 'abb']
>>> regex.findall(r'(?b)(ab[cd]){i<=0,s<=2,d<=1}', "abc abd abd aaa abc abb abc", regex.V1)
['abc', 'abd', 'abd', 'abc', 'abc']
Answered By - Stefan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.