Issue
I am trying to extract few fields from OCR image. I am using pytesseract to read OCR image file and this is working as expected.
Code :
import pytesseract
from PIL import Image
import re
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-
OCR\tesseract.exe"
value = Image.open("ocr.JPG")
text = pytesseract.image_to_string(value)
print(text)
Output :
ALS 1 Emergency Base Rate
Y A0427 RE ABC
Anbulance Mileage Charge
Y A0425 RE ABC
Disposable Supplies
Y A0398 RH ABC
184800230, x
Next, I have to extract A0427 and A0425 from the text.. but the problem is I am not loop through the whole line.. it's taking one character at a time and that's why my regular expression isn't working..
Code:
for line in text :
print(line)
x= re.findall(r'^A[0-9][0-9][0-9][0-9]', text)
print(x)
Solution
Get rid of that for loop also, use only
x= re.findall(r'A[0-9][0-9][0-9][0-9]', text)
without any loop. ('remove ^ too')
Answered By - Patel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.