Issue
I'm retrieving PAN number from PAN card. When I scrape pan number, some times it has some spaces between few numbers like DWKP K3344E
, where the actual PAN number expression would be ABCDE1234F
. I want to consider that spaces too all over the scraped number in regex.
import re
if re.search(r'^([a-z?A-Z?0-9]){5}([a-z?A-Z?0-9]){4}([a-z?A-Z?0-9]){1}?$', 'DWKP K3344E'):
print("True")
else:
print("False")
The regex code should return true for the above content too. In order to do that just want to modify the r'^([a-z?A-Z?0-9]){5}([a-z?A-Z?0-9]){4}([a-z?A-Z?0-9]){1}?$'
part in the code.
Thanks in advance.
Solution
I suggest removing the whitespaces (say, using re.sub(r'\s+', '', text)
to ensure all Unicode whitespaces are gone) from the string before checking it with a regex.
Besides, your regex contains question marks in the character classes and thus these chars are allowed in the input. You must remove them.
Try
if re.search(r'^[a-zA-Z]{5}[0-9]{4}[a-zA-Z]$', re.sub(r'\s+', '', text)):
# do something
Here, re.sub(r'\s+', '', text)
removes all possible whitespaces from the text first, then ^[a-zA-Z0-9]{5}[0-9]{4}[a-zA-Z]$
ensures the result matches:
^
- start of the string[a-zA-Z]{5}
- five letters[0-9]{4}
- four digits[a-zA-Z]
- a letter$
- end of string.
See the regex demo.
Answered By - Wiktor Stribiżew
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.