Issue
I'm trying to make a script that can identify the number in a picture, more precisely pictures REALLY similar to this one:
This goes from 50
to 1
, but I'm having some problems reading the number present in there using pytesseract
. Here's the code I'm using to read it:
im = Image.open(filename)
text = image_to_string(im)
All results I get are like this:
What can I do to improve the readings?
Solution
Improving the quality of the output is your "holy scripture" when working with Tesseract. Before binarization, you could first try to grayscale your image:
from PIL import Image
import pytesseract
im = Image.open('G9hvi.png').convert('L')
text = pytesseract.image_to_string(im)
print(text.replace('\f', ''))
# 50
Boom! – without any further pre-processing you already get the correct result.
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.2
Pillow: 8.2.0
pytesseract: 5.0.0-alpha.20201127
----------------------------------------
Answered By - HansHirse
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.