Issue
I have to match some lines as below.
Case 1:
[01:32:12.036,000] <tag> label: val3. STATUS = 0x1
[01:32:12.036,001] <tag> label: val3. MISC = 0x8
[02:58:34.971,000] <tag> label: val2. STATUS = 0x2
Case 2:
[01:32:12.036,000] <tag> label: val3. STATUS = 0x1
[02:58:34.971,000] <tag> label: val2. STATUS = 0x2
[01:32:12.036,001] <tag> label: val2. MISC = 0x6
The line that has MISC
value is optional and may be missing. The line with STATUS
will always preceed MISC
line and is always present.
To match this I am using regex like this: "label: val(\d+). STATUS = (0x[0-9a-fA-F]+)(.*?(label: val(\d+). MISC = (0x[0-9a-fA-F]+)))?"
This is working for Case 1 and is correctly reporting the values. The ootput for matched groups is as below:
MATCH 1
[0] 3
[1] 0x1
[2]
[01:32:12.036,001] <tag> label: val3. MISC = 0x8
[3] label: val3. MISC = 0x8
[4] 3
[5] 0x8
MATCH 2
[0] 2
[1] 0x2
[2]
[3]
[4]
[5]
But for Case 2, this is skipping second STATUS
in line 2 as below:
Match 1
[0] 3
[1] 0x1
[2]
[02:58:34.971,000] <tag> label: val2. STATUS = 0x2
[01:32:12.036,001] <tag> label: val2. MISC = 0x6
[3] label: val2. MISC = 0x6
[4] 2
[5] 0x6
I needed 2 matches here also, with first match not reporting MISC
.
What am I doing wrong here?
Solution
Without having the dot matching a newline re.DOTALL
or multiline re.MULTILINE
flag, you can match STATUS and then optionally match MISC on the second line using a newline using an optional non capture group.
Note to escape the dot to match it literally, and if label is at the end and only a single occurrence, you don't have to use a non greedy dot.
label: val(\d+)\. STATUS = (0x[0-9a-fA-F]+)(?:\n.*(label: val(\d+)\. MISC = (0x[0-9a-fA-F]+)))?
Example
import re
pattern = r"label: val(\d+)\. STATUS = (0x[0-9a-fA-F]+)(?:\n.*(label: val(\d+)\. MISC = (0x[0-9a-fA-F]+)))?"
s = ("[01:32:12.036,000] <tag> label: val3. STATUS = 0x1\n"
"[02:58:34.971,000] <tag> label: val2. STATUS = 0x2\n"
"[01:32:12.036,001] <tag> label: val2. MISC = 0x6")
matches = re.findall(pattern, s)
print(matches)
Output
[
('3', '0x1', '', '', ''),
('2', '0x2', 'label: val2. MISC = 0x6', '2', '0x6')
]
Answered By - The fourth bird
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.