Issue
I am working on a program that uses a webcam to read constantly changing digits off of a screen using pytesseract (long story). It takes an image of the whole screen, then cuts out each number needed to be recorded (there are 23 of them) using predetermined coordinates stored in the list called 'roi'. There are some other steps but this is the most important part. Currently it is adding, deleting, and changing numbers constantly, but not consistently. Here are some examples:
It reads this incorrectly as '32.0'
It reads this correctly as '52.0'
It reads this incorrectly as '39.3'
It reads this incorrectly as '2499.1'
These images have already been processed using OpenCV, and it's what all the images in the roi set look like. Based on other answers, I have binarized it, tried to clean up the edges, and put a white border around the image (see code).
This program reads the screen every 30 seconds, sometimes getting it right, other times getting it wrong. Many times it likes change 5s into 3s, 3s into 5s, and 5s into 9s. Sometimes it just misses or adds digits altogether. Below is my code for processing the images.
pytesseract.pytesseract.tesseract_cmd = #tesseract file path
scale = 1.4
img = cv2.imread(#image file path#)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.rotate(img, cv2.ROTATE_180)
width = int(img.shape[1] / scale)
height = int(img.shape[0] / scale)
dim = (width, height)
img = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)
cv2.destroyAllWindows()
myData = []
cong = r'--psm 6 -c tessedit_char_whitelist=+0123456789.-'
for x,r in enumerate(roi):
imgCrop = img[r[0][1]:r[1][1], r[0][0]:r[1][0]]
scalebig = 0.2
wid = int(imgCrop.shape[1] / scalebig)
hei = int(imgCrop.shape[0] / scalebig)
newdims = (wid, hei)
imgCrop = cv2.resize(imgCrop, newdims)
imgCrop = cv2.threshold(imgCrop,155,255,cv2.THRESH_BINARY)[1]
kernel2 = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
imgCrop = cv2.morphologyEx(imgCrop, cv2.MORPH_CLOSE, kernel2, iterations=2)
value = [255,255,255]
imgCrop = cv2.copyMakeBorder(imgCrop, 10, 10, 10, 10, cv2.BORDER_CONSTANT, None, value = value)
datapoint = pytesseract.image_to_string(imgCrop, lang='eng', config=cong)
myData.append(datapoint)
The output is the pictures I linked above.
I have looked into fine tuning it, but I have a Windows machine and I can't seem to find a good tutorial. I am not a programmer by trade, I spent 2 months teaching myself Python to do this, but the machine learning aspect of Tesseract has me spinning, and I don't know how else to fix remarkably inconsistent readings. If you need any further info please ask and I'll be happy to tell you.
Edit: Added some more incorrectly read images for reference
Solution
- Make sure you use the right image format (jpeg is the wrong format for OCR)
- In the case of the tesseract LSTM engine make sure the letter size is not bigger than 35 points.
With tesseract best_tessdata I got these results:
tesseract 593_small.png -
59.3
tesseract 520_small.png -
52.0
tesseract 2491_small.png -
249.1
Answered By - user898678
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.