Issue
I wrote a simple python code to get text from an image. Most of the text here is Hindi but only text I care about is the 12 digit number in the image "5485 5000 8000". Here is the code I wrote:
import cv2
import pytesseract
img = cv2.imread('Aadhar-Card.jpg',0)
text = pytesseract.image_to_data(img,lang='eng', config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
print(text)
h, w = img.shape
boxes = pytesseract.image_to_boxes(img)
for b in boxes.splitlines():
b = b.split(' ')
img1 = cv2.rectangle(img, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
And here is the output. So the number is the only thing getting skipped. Is there any way to fix this?
Solution
You may crop the interested section (the number area) at first, then implement the OCR.
import cv2
import pytesseract
img = cv2.imread('Aadhar-Card.jpg',0)
crop_img = img[173:173+30, 117:117+150]
strNum = pytesseract.image_to_string(crop_img, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
print(strNum)
cv2.imshow("cropped", crop_img)
cv2.waitKey(0)
Answered By - zhugen
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.