Issue
I am trying to do number plate recognition using tesseract 4.0.0-beta.1. In tesseract documentation, it is told to create box files in the form . I tried using "makebox" function. But, it is not detecting every character properly. Then, somewhere i read that this function is for version 3.x.
I later tried "wordstrbox" function. But the box file which is created in this way is empty. Can someone tell me how to create box files for tesseract 4.0.0-beta.1.
Solution
Use pytesseract.image_to_data()
import pytesseract
import cv2
from pytesseract import Output
img = cv2.imread('image.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(text,x,y,w,h) = (d['text'][i],d['left'][i],d['top'][i],d['width'][i],d['height'][i])
cv2.rectangle(img, (x,y), (x+w,y+h) , (0,255,0), 2)
cv2.imshow('img',img)
cv2.waitkey(0)
Among the data returned by pytesseract.image_to_data():
left
is the distance from the upper-left corner of the bounding box, to the left border of the image.top
is the distance from the upper-left corner of the bounding box, to the top border of the image.width
andheight
are the width and height of the bounding box.conf
is the model's confidence for the prediction for the word within that bounding box. Ifconf
is -1, that means that the corresponding bounding box contains a block of text, rather than just a single word.
The bounding boxes returned by pytesseract.image_to_boxes()
enclose letters so I believe pytesseract.image_to_data()
is what you're looking for.
Answered By - AlfyFaisy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.