Issue
I have written algorithm that solves the pluszle game matrix. Input is numpy array.
Now I want to recognize the digits of matrix from screenshot.
There are different levels, this is hard one:
And this is easy one:
the output of recognition should be numpy array
array([[6, 2, 4, 2],
[7, 8, 9, 7],
[1, 2, 4, 4],
[7, 2, 4, 0]])
I have tried to feed last image to tesseract
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
print(pytesseract.image_to_string(Image.open('C:/Users/79017/screen_plus.jpg')))
The output is unacceptable
LEVEL 4
(}00:03 M0
J] —.°—@—@©
I think that I should use contours from opencv, because the font is always the same. maybe I should save contours for every digit, than save every countour that exist on screenshot than somehow make matrix from coordinates of every digit-contour. But I have no idea how to do it.
Solution
1- Binarize
Tesseract needs you to binarize the image first. No need for contour or any convolution here. Just a threshold should do. Especially considering that you are trying to che... I mean win intelligently to a specific game. So I guess you are open to some ad-hoc adjustments.
For example, (hard<240).any(axis=2)
put in white (True) everything that is not white on the original image, and black the white parts.
Note that you don't get the sums (or whatever they are, I don't know what this game is) here. Which are on the contrary almost black areas
But you can have them with another filter
(hard>120).any(axis=2)
You could merge those filters, obviously
(hard<240).any(axis=2) & (hard>120).any(axis=2)
But that may not be a good idea: after all, it gives you an opportunity to distinguish to different kind of data, why you may want to do.
2- Restrict
Secondly, you know you are looking for digits, so, restrict to digits. By adding config='digits'
to your pytesseract args.
pytesseract.image_to_string((hard>240).all(axis=2))
# 'LEVEL10\nNOVEMBER 2022\n\n™\noe\nOs\nfoo)\nso\n‘|\noO\n\n9949 6 2 2 8\n\nN W\nN ©\nOo w\nVon\n+? ah ®)\nas\noOo\n©\n\n \n\x0c'
pytesseract.image_to_string((hard>240).all(axis=2), config='digits')
# '10\n2022\n\n99496228\n\n17\n-\n\n \n\x0c'
3- Don't use image_to_string
Use image_to_data
preferably.
It gives you bounding boxes of text.
Or even image_to_boxes
which give you digits one by one, with coordinates
Because image_to_string
is for when you have a good old linear text in the image. image_to_data
or image_to_boxes
assumes that text is distributed all around, and give you piece of text with position.
image_to_string
on such image may intervert what you would consider the logical order
4- Select areas yourself
Since it is an ad-hoc usage for a specific application, you know where the data are.
For example, your main matrix seems to be in area
hard[740:1512, 132:910]
See
print(pytesseract.image_to_boxes((hard[740:1512, 132:910]<240).any(axis=2), config='digits'))
Not only it avoids flooding you with irrelevant data. But also, tesseract performs better when called only with an image without other things than what you want to read.
Seems to have almost all your digits here.
5- Don't expect for miracles
Tesseract is one of the best OCR. But OCR are not a sure thing...
See what I get with this code (summarizing what I've said so far), printing in red digits detected by tesseract just next to where they were found in the real image.
import cv2
import matplotlib.pyplot as plt
import numpy as np
import pytesseract
hard=cv2.imread("hard.jpg")
hard=hard[740:1512, 132:910]
bin=(hard<240).any(axis=2)
boxes=[s.split(' ') for s in pytesseract.image_to_boxes(bin, config='digits').split('\n')[:-1]]
out=hard.copy() # Just to avoid altering original image, in case we want to retry with other parameters
H=len(hard)
for b in boxes:
cv2.putText(out, b[0], (30+int(b[1]), H-int(b[2])), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
plt.imshow(cv2.cvtColor(out,cv2.COLOR_BGR2RGB))
plt.show()
As you can see, result are fairly good. But there are 5 missing numbers. And one 3 was read as "3.".
For this kind of ad-hoc reading of an app, I wouldn't even use tesseract. I am pretty sure that, with trial and errors, you can easily learn to extract each digits box your self (there are linearly spaced in both dimension).
And then, inside each box, well there are only 9 possible values. Should be quite easy, on a generated image, to find some easy criterions, such as the number of white pixels, number of white pixels in top area, ..., that permits a very simple classification
Answered By - chrslg
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.