Issue
I am trying to write a regex for the following use cases in a one line regex.
ex:
Table 1-2: this is a sample text 2 and some hyphen - (abbreviation)
Table 1: this is a sample text 2 and some hyphen - (abbreviation)
Table 1 this is a sample text 2 and some hyphen - (abbreviation)
Table 1-2-1: this is a sample text 2 and some hyphen - (abbreviation)
similarly
Figure 1-2: this is a sample text 2 and some hyphen - (abbreviation)
Figure 1: this is a sample text 2 and some hyphen - (abbreviation)
Figure 1 this is a sample text 2 and some hyphen - (abbreviation)
Figure 1-2-1: this is a sample text 2 and some hyphen - (abbreviation)
i tried the following approach
import re
re.sub(r'^Table ()|([0-9]+[-][0-9]+|[0-9]+|[0-9 ]+)', " ", text_to_search)
re.sub(r'^Figure ()|([0-9]+[-][0-9]+|[0-9]+|[0-9 ]+)', " ", text_to_search)
Well this is not so good approach, also looking to eliminate the dependency of Table and Figure. Please do suggest. Thanks in advance for your time.
Expected Output:
['Table', '1-2:', 'this is a sample text 2 and some hyphen - (abbreviation)']
['Table', '1:', 'this is a sample text 2 and some hyphen - (abbreviation)']
['Table', '1', 'this is a sample text 2 and some hyphen - (abbreviation)']
['Table', '1-2-1:', 'this is a sample text 2 and some hyphen - (abbreviation)']
['Figure', '1-2:', 'this is a sample text 2 and some hyphen - (abbreviation)']
['Figure', '1:', 'this is a sample text 2 and some hyphen - (abbreviation)']
['Figure', '1', 'this is a sample text 2 and some hyphen - (abbreviation)']
['Figure', '1-2-1:', 'this is a sample text 2 and some hyphen - (abbreviation)']
I am looking for the value available at list[2]
Solution
This will work to match everything listed in your "Expected Output"
pattern = re.compile(r'^(\w+)\s([-0-9]+:?)\s(.*\))$')
matches = re.findall(pattern, text_to_search)
print(matches)
However, if what you really want is ['Table', '1', 'this is a sample text 2 and some hyphen - (abbreviation)']
or ['Figure', '1', 'this is a sample text 2 and some hyphen - (abbreviation)']
(I'm guessing this is what "I am looking for the value available at list[2]" means)
then this pattern should work...
pattern = re.compile(r'^(\w+)\s(\d+)\s(.*\))$')
Answered By - JRiggles
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.