Sunday, March 27, 2022

[FIXED] Reading numbers from image PyTesseract

March 27, 2022 cv2, python, python-tesseract, tesseract No comments

Issue

So I'm trying to read the text in an image, and I'm experiencing some issues with it.

The image:

My code:

import cv2
import pytesseract


def read_img():
    pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
    return cv2.imread('Images/Image2.png')


def process_text(img):
    names = []
    data = pytesseract.image_to_data(img)
    for x, d in enumerate(data.splitlines()):
        if x != 0:
            d = d.split()
            if len(d) == 12:
                names.append(d[11])

    return names


img = read_img()
print(process_text(img))

Result:

['-', '©', '-', 'AceeZ.Rogue', 'a', '5540', 't', '3', '8', '&', '©', 'LeonGids.Rogue', 'a', 'seas', '8', '3', '8', 'e', 'ﬂ', 'karzheka.Rogue', 'a', '5151', '8', '2', '7', '48', '7', 'Q', 'ripz.Rogue', 'a', '5105', '8', '[', '5s', '27', 'm', 'korey.Rogue', 'a', '5105', '7', '2', '6', '36', '-', '[ZH]', 'Shaiiko.BDS', 'C', '3520', 'a', 'B', 's', '22', 'Cps', 'a', '2012', '8', 'i', '8', '21', 'ypc', 'Chee', 'e', '8', '-_', '22', '3', '(2)', 'Flemzje.BDS', 'a', '2420', 'a', '3', '10', '26', '(SF)', 'Renshiro.BDS', 'C', '2410', '6', '1', '8', 'Fo']

As you can see this is not the result I was hoping for. Here's what I've tried;

Splitting the image up

I've split the image up into two to have it more centered on the actual text:

The result of img1 is actually perfect:

['AceeZ.Rogue', 'LeonGids.Rogue', 'karzheka.Rogue', 'ripz.Rogue', 'korey.Rogue', 'Shaiiko.BDS', 'BriD.BDS', 'RaFaLe.BDS', 'Elemzje.BDS', 'Renshiro.BDS']

But with img2 issues arise again:

['5540', '5343', '5151', '5105', '5105', '3520', '29012', '2695', '2420', '2410', '11', '10']

It looks like tesseract is having issues reading numbers, because img1 with just text went fine? I've tried increasing the quality of the text (letsenhance.io) and also increasing contrast:

Neither of these methods worked.

Using config options

I've tried using config options like '--psm 6' and 'outbase digits' which didn't fix the problem either.

I saw on this page that training with the specified font is a possibility (https://stackoverflow.com/a/53763425/10503012) but I sadly don't know the font and https://www.myfonts.com/WhatTheFont/ didn't give me the exact font so I'm assuming that's not an option either.

So my question is; is it even possible to extract the text/numbers from this image or is this a lost case? What more can I do to improve the result tesseract gives me? I have the idea that the image with high contrast should work but it clearly doesn't.

Thanks for any help.

Solution

Usually Tesseract likes black text on white background. So you should invert your input image. You should also consider thresholding the image to make it black and white. Finally, Tesseract can be sensitive to the size of each character. I found that the user names were recognized OK at the provided scale, but I had to scale the image by 1.25 to get the numbers to come out.

import cv2
import pytesseract

img = cv2.imread('acerogue.png', cv2.IMREAD_GRAYSCALE)  

thresh = cv2.threshold(img, 100, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]
thresh = cv2.resize(thresh, (0,0), fx=1.25, fy=1.25)  # scale image 1.25X

detected_text = pytesseract.image_to_string(thresh, config = '--psm 6')
print(detected_text)

which gives

| ® AceeZ.Rogue 8 5540 11 2 8 -
© LeonGids.Rogue 8 5343 8 3 8 -

Ww karzheka.Rogue a 5151 8 2 7 48

7 tipz.Rogue a 5105 8 0 5 27
& korey.Rogue a 5105 7 2 6 36

| #4 Shaiiko.BDS B 3520 9 3 8 22
BriD.BDS mH 2912 8 1 8 21

S RaFaLe.BDS BH  —_2605 a 2 8 2

3 BS Elemzje.Bos H 2420 3 3 10 26
Se) Renshiro.BDS m 2410 6 1 8 45

Probably, you should precrop image to get rid of icons.

Answered By - bfris

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, March 27, 2022

[FIXED] Reading numbers from image PyTesseract

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels