Issue
I'm using https://github.com/mrabarnett/mrab-regex (via pip install regex
, but experiencing a failure here:
pattern_string = r'''
(?&N)
^ \W*? ENTRY \W* (?P<entries> (?&Range) ) (?&N)
(?(DEFINE)
(?P<Decimal>
[ ]*? \d+ (?:[.,] \d+)? [ ]*?
)
(?P<Range>
(?&Decimal) - (?&Decimal) | (?&Decimal)
#(?&d) (?: - (?&d))?
)
(?P<N>
[\s\S]*?
)
)
'''
flags = regex.MULTILINE | regex.VERBOSE #| regex.DOTALL | regex.V1 #| regex.IGNORECASE | regex.UNICODE
pattern = regex.compile(pattern_string, flags=flags)
bk2 = f'''
ENTRY: 0.0975 - 0.101
'''.strip()
match = pattern.match('ENTRY: 0.0975 - 0.101')
match.groupdict()
gives:
{'entries': '0.0975', 'Decimal': None, 'Range': None, 'N': None}
It misses the second value.
> pip show regex
Name: regex
Version: 2022.1.18
Summary: Alternative regular expression module, to replace re.
Home-page: https://github.com/mrabarnett/mrab-regex
Author: Matthew Barnett
Author-email: [email protected]
License: Apache Software License
Location: ...
Requires:
Required-by:
> python --version
Python 3.10.0
Solution
The problem is that the spaces you defined in the Decimal
group pattern are consumed, and the DEFINE
patterns are atomic, so although the last [ ]*?
part is lazy and can match zero times, once it matches, there is no going back. You can check this if you put the Decimal
pattern into an atomic group and compare two patterns, cf. this regex demo and this regex demo. (?mx)^\W*?ENTRY\W*(?P<entries>(?>[ ]*? \d+ (?:[.,] \d+)? [ ]*?) - (?>[ ]*? \d+ (?:[.,] \d+)? [ ]*?) | (?>[ ]*? \d+ (?:[.,] \d+)? [ ]*?))
exposes the same behavior as your regex with DEFINE
block, while (?mx)^\W*?ENTRY\W*(?P<entries>[ ]*? \d+ (?:[.,] \d+)? [ ]*? - [ ]*? \d+ (?:[.,] \d+)? [ ]*? | [ ]*? \d+ (?:[.,] \d+)? [ ]*?)
finds the match correctly.
The easiest fix is to move the optional space patterns into the Range
group pattern.
There are other minor enhancements you might want to introduce here:
- As you are only interested in the captured substring, you do not need to use
regex.match
with theN
group pattern ([\s\S]*?
), you may useregex.search
and remove theN
pattern from the regex - You do not need to use a group for a
a|a-b
like patterns, you can use a more efficient optional non-capturing group approach,a(?:-b)?
.
So, the regex can look like
^ \W* ENTRY \W* (?P<entries> (?&Range) )
(?(DEFINE)
(?P<Decimal>
\d+ (?:[.,] \d+)?
)
(?P<Range>
(?&Decimal)(?:\ *-\ *(?&Decimal))*
)
)
See the regex demo.
See the Python demo:
import regex
pattern_string = r'''
^ \W* ENTRY \W* (?P<entries> (?&Range) )
(?(DEFINE)
(?P<Decimal>
\d+ (?:[.,] \d+)?
)
(?P<Range>
(?&Decimal)(?:\ *-\ *(?&Decimal))?
)
)
'''
flags = regex.MULTILINE | regex.VERBOSE
pattern = regex.compile(pattern_string, flags=flags)
bk2 = f'''
ENTRY: 0.0975 - 0.101
'''.strip()
match = pattern.search('ENTRY: 0.0975 - 0.101')
print(match.groupdict())
Output:
{'entries': '0.0975 - 0.101', 'Decimal': None, 'Range': None}
Answered By - Wiktor Stribiżew
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.