Issue
I have this image, and I'm trying to read it with Tesseract:
My code is like that:
pytesseract.image_to_string(im)
But, what I get is only LOW: 56
. So, Tesseract is unable to read the 1
in the first line. I've tried to specify also a whitelist of only digits like
pytesseract.image_to_string(im, config="tessedit_char_whitelist=0123456789.")
and to process the image with an erosion but nothing works. Any suggestions?
Solution
Improving the quality of the output is your "holy scripture" when working with Tesseract. Especially, the page segmentation method should always be explicitly set. Here (as most of the times), I'd opt for --psm 6
:
Assume a single uniform block of text.
Even without further preprocessing of your image, you already get the desired result:
import cv2
import pytesseract
image = cv2.imread('gBrcd.png')
text = pytesseract.image_to_string(image, config='--psm 6')
print(text.replace('\f', ''))
# 1
# LOW: 56
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.1
OpenCV: 4.5.2
pytesseract: 5.0.0-alpha.20201127
----------------------------------------
Answered By - HansHirse
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.