Issue
I know this has been asked before, and I have been trying several different methods and changing things, but cannot figure out how to get this to work. I have a bunch of pages where this works perfectly. This is clear text perfectly laid out. But for some reason, on one of the sheets it is messing up and reading completely wrong info. Below I have attached my code, output, and the image.
import pytesseract
import cv2
import numpy as np
img = cv2.imread('page_3.jpg')
img = cv2.resize(img, None, fx=2, fy=2)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((1, 1), np.uint8)
cv2.imwrite('thresh.png', img)
for psm in range(6, 13 + 1):
config = '--oem 3 --psm %d' % psm
txt = pytesseract.image_to_string(img, config=config, lang='eng')
print('psm ', psm, ':', txt)
And then here is the output. It works perfectly until the end for some reason. All of the outputs (psm 6, 11, and 12) are reading the exact same. Any help is appreciated.
1885-1015
1886-1280
1956-0044
2087-0047
2087-0155
2087-1433
2221-0093L
2221-0093R
2331-4628R
2992-/114R
29593-0007R
Solution
Your image does not require any pre-processing at all. It is already perfect and structured. So try not to resize the image before passing it to tesseract
. Resizing is not needed in your case.
Hope this helps.
Answered By - Tarun Chakitha
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.