Issue
I'm reading a text file and I need to split on specific word.
I want to split on NUMBER(x, x),
Here's my code:
import re
str1 = "PERSON_LEAVE_BAL_NUMBER NUMBER(5, 2),"
str2 = "CURRENT_BILL NUMBER(*, 2),"
# desired output
#['PERSON_LEAVE_BAL_NUMBER', 'NUMBER(5, 2)']
#['CURRENT_BILL', 'NUMBER(*, 2)']
if re.search(r'\bNUMBER\b', str1):
str1_items = re.split("NUMBER[^a-zA-Z ]+",str1)
print (str1_items)
if re.search(r'\bNUMBER\b', str2):
str2_items = re.split("NUMBER[^a-zA-Z ]+",str2)
print (str2_items)
I am searching on the word boundary and then trying to select on number part but I am not able to select and parsing it properly. Any suggestions?
Solution
Splitting on the string returns the parts of the text which are on both sides of the string.
You can put back the string:
split1 = "PERSON_LEAVE_BAL_NUMBER NUMBER(5, 2),".split("NUMBER")
print([split1[0], *map(lambda x: "NUMBER" + x, split1[1:])])
or split on the zero-width assertion which requires the string you split on to be followed by the token you require to be present:
split2 = re.split(r" (?=NUMBER[^a-zA-Z ])", "CURRENT_BILL NUMBER(*, 2),")
print(split2)
Notice that the first example keeps the space before NUMBER
in the first extracted string, whereas the second example actually splits on, and discards, the space. It's not entirely clear which of the two you prefer, though I would expect the latter.
With re.split
, you can also create a list element by using a capturing group; though then, you also end up with an empty element at the end of the result.
split3 = re.split(" (NUMBER.*)", "PERSON_LEAVE_BAL_NUMBER NUMBER(5, 2),")
print(split3[:-1])
Answered By - tripleee
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.