Issue
I've been working with pytesseract
the past days, and I've noticed that the library is quite bad at identifying numbers. I do not know, if I am doing something wrong, but I keep getting ♀
as an output.
class Image_Recognition():
def digit_identification(self):
# save normal screenshot
screen = ImageGrab.grab(bbox=(706,226,1200,726))
screen.save(r'tmp\tmp.png')
# read the image file
img = cv2.imread(r'tmp\tmp.png', 2)
# convert to binary image
[ret, bw_img] = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)
# use OCR library to identify numbers in screenshot
text = pytesseract.image_to_string(bw_img)
print(text)
INPUT:
(Converted to a binary image in order to make numbers more intelligible.)
OUTPUT:
♀
Tell me if there is something off, or just suggest other approaches for handling text recognition.
Solution
First of all, please read the article Improving the quality of the output, especially the section regarding the page segmentation method. Also, you can limit the characters to be found to digits 0-9
.
You have a tiny image, which makes extraction of all numbers at once quite challenging, especially for the mixture of bright text on dark background and vice versa. But, you can quite easily crop all the single tiles, and extract the numbers one by one. So, no distinction between these two types of tiles needs to be made.
Also, you know, that numbers must be multiples of two (I guess, most people will know 2048). So, if no such a number could be found, try upscaling the cropped tile, and repeat. (Eventually, give up after a few times.)
That'd be my full code:
import cv2
import math
import pytesseract
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def log2(x):
return math.log10(x) / math.log10(2)
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def is_power_of_2(n):
return math.ceil(log2(n)) == math.floor(log2(n))
# Load image, get dimensions of a single tile
img = cv2.imread('T72q4s.png')
h, w = [x // 4 for x in img.shape[:2]]
# Initialize result array (too lazy to import NumPy for that...)
a = cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (4, 4)).astype(int)
# https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method
# https://stackoverflow.com/q/4944830/11089932
config = '--psm 6 -c tessedit_char_whitelist=0123456789'
# Iterate tiles, and extract texts
for i in range(4):
for j in range(4):
# Crop tile
x1 = i * w
x2 = (i + 1) * w
y1 = j * h
y2 = (j + 1) * h
roi = img[y1:y2, x1:x2]
# If no proper power of 2 is found, upscale image and repeat
while True:
text = pytesseract.image_to_string(roi, config=config)
text = text.replace('\n', '').replace('\f', '')
if (text == '') or (not is_power_of_2(int(text))):
roi = cv2.resize(roi, (0, 0), fx=2, fy=2)
if roi.shape[0] > 1000:
a[j, i] = -1
break
else:
a[j, i] = int(text)
break
print(a)
For the given image, I get the following output:
[[ 8 16 4 2]
[ 2 8 32 8]
[ 2 4 16 4]
[ 4 2 4 2]]
For another similar image
I get:
[[ 4 -1 -1 -1]
[ 2 2 -1 -1]
[-1 -1 -1 -1]
[ 2 -1 -1 -1]]
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.3
OpenCV: 4.5.3
pytesseract: 5.0.0-alpha.20201127
----------------------------------------
Answered By - HansHirse
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.