Thursday, March 31, 2022

[FIXED] pytesseract not picking up individual characters

March 31, 2022 cv2, python, python-tesseract No comments

Issue

I'm currently struggling. Pytesseract is failing to detect single digits. You can see the image I'm trying to read, the code and the current result I'm receiving. Any help would be much appreciated.

Current result = ['WLDOT', 'ROOTOO2', 'Boombastic', 'Loukan', 'ExpertAz', 'Stryzhh', 'Najm', 'JAMIN', ' ', '7157', '5618', '4864', '4762', '4294', '3287', '26', '34', '23', '32', '241', '240', '171', '137', '183', '200', '136', '181', '762', '689707', '733165', '698822', '724485', '647404', '566613', '580621', '566721', '189025']

    import cv2
    import pytesseract
    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
    
    
    image2 = r'C:\Reader\unknown.png'
    
    image = cv2.imread(image2, 0)
    # Edit for accuracy (Image read)
    thresh = cv2.threshold(image, 180, 255, cv2.THRESH_BINARY)[1]
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
    close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
    result = 255 - close
    cv2.imshow('result', result)
    cv2.waitKey()
    textOffImage = str(pytesseract.image_to_string(result, config='--psm 3')).split("\n")
    textOffImage = list(filter(None, textOffImage))
    print(textOffImage)

Solution

You can use inRange thresholding

The result will be:

Now, if you read using --psm 6:

WLDOT 17790 14 0 241 o 733165 :
ROOTOO2 17576 24 1 240 0 698822
Boombastic 17157 19 5 171 762 724485
Loukan 15618 26 4 137 0 647404 y
ExpertAz 14864 34 1 183 0 566613
Stryzhh 14762 23 3 200 0 580621 ,
Najm 14294 32 1 136 0 566721
JAMIN 13287 16 Q 181 689707 189025
k

As you can see there are some flaws but most of the input are correctly recognized.

If you want only digits, you can use --psm 6 digits:

17790 14 0 241 733165
00002 17576 24 1 240 0 698822
17157 19 5 171 762 724485
15618 26 4 137 0 647404
14864 34 1 183 0 566613
14762 23 3 200 0 580621
14294 32 1 136 0 566721
13287 16 0 181 689707189025

As you can see from the above all the digits are correctly recognized.

For more you can read: Improving the quality of the output

Code:

import cv2
import pytesseract
from numpy import array

img = cv2.imread("TI5Jc.png")  # Load the image

hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)  #
msk = cv2.inRange(hsv, array([0, 0, 0]), array([179, 84, 255]))
krn = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dlt = cv2.dilate(msk, krn, iterations=1)
thr = 255 - cv2.bitwise_and(dlt, msk)
txt = pytesseract.image_to_string(thr, config='--psm 6 digits')
print(txt)

Answered By - Ahx

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, March 31, 2022

[FIXED] pytesseract not picking up individual characters

Issue

Solution

Code:

0 comments:

Post a Comment

Popular Posts

Labels