Wednesday, June 22, 2022

[FIXED] how can I improve digit OCR accuracy with opencv and pytesseract

June 22, 2022 image-processing, ocr, opencv, python-tesseract No comments

Issue

thanks so much for your time in advance. i tried the following code to grab digits from the attached image but the results were so bad. I would really appreciate some suggestions on how to preprocess the image so i can get better results. does the red background in the img makes it difficult to get result?

image with digits to OCR

#import needed modules

import cv2
import pytesseract
from PIL import Image
import numpy as np

def thin_font(pic):
    pic = cv2.bitwise_not(pic)
    kernel = np.ones((1,1),np.uint8)
    pic = cv2.erode(pic, kernel, iterations=1)
    pic = cv2.bitwise_not(pic)
    return (pic)

imgFile = "c:/test1.jpg"

img = cv2.imread(imgFile)

#img upscaling----------------------

width = int(img.shape[1]*1.4)
height = int(img.shape[0]*1.4)
dim = (width, height)

resized = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)

thinimg = thin_font(resized)
imggray = cv2.cvtColor(thinimg, cv2.COLOR_BGR2GRAY)

imginv = cv2.bitwise_not(imggray)

thresh, inputimg = cv2.threshold(imginv, 150, 230,cv2.THRESH_BINARY)



#-----------------------------------------------
text = pytesseract.image_to_string(inputimg, config="outputbase digits")

print(text)

Solution

With some trial and error I managed to get decent results, but not perfect...

The main idea is "representing" PyTesseract one table cell at a time.

The answer doesn't include automatic table separation using image processing.
The solution assumes that the width and height of the cells are fixed and known from advance (some cropping and padding were needed).
(In case you want to do it automatically, here is a nice code sample).

Preprocessing that gave the best OCR results:

Convert the image (or each cell) to grayscale.
Invert polarity - make the text black on white (instead of white on black).
Resize the "cell" by a factor of x2 in each axis.

Tesseract configuration that gave the best results:

text = pytesseract.image_to_string(cell, config="-c tessedit"
                                   "_char_whitelist=' '0123456789-."
                                   " --psm 6")

Code sample:

import pytesseract
import numpy as np

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # May need when using Windows

imgFile = "test1.png"

img = cv2.imread(imgFile)

img = img[0:-10, 1:-4, :]  # Crop the relevant part
img = np.pad(img, ((3, 0), (0, 0), (0, 0)), 'edge')  # Add some padding to the top (making constant cell height).

for row in range(8):
    print()  # New line
    for col in range(7):
        x0 = col*80  # Assume cell width is 80 pixels
        y0 = row*19  # Assume cell height is 19 pixels
        x1 = x0 + 80
        y1 = y0 + 20
        cell = img[y0+1:y1-1, x0+1:x1, :]  # Crop the cell in position [col, row]
        cell = 255 - cv2.cvtColor(cell, cv2.COLOR_BGR2GRAY)  # Convert to grayscale and invert polarity
        cell = cv2.resize(cell, (cell.shape[1]*2, cell.shape[0]*2), interpolation=cv2.INTER_CUBIC)  # Resize up by a factor of x2 in each axis.
        text = pytesseract.image_to_string(cell, config="-c tessedit"
                                                        "_char_whitelist=' '0123456789-."
                                                        " --psm 6")
        print(text.rjust(11), end='', flush=True)  # Print the text without newline (add leading spaces).
        cv2.imshow('cell', cell)  # Show the cell as image
        cv2.waitKey()  # Wait for key pressing

print()  # New line
cv2.destroyAllWindows()

Output:

  -2227    -410.59      11.11  -12673.94    -135.49    -106.01    -349.10
  -2629    -403.90       3.81  -15635.17    -243.68    -115.72     318.26
  -1791     404.17       8.60   -8068.60      44.42     -87.76   -1663.20
  -2920    -674.54       5.74  -11296.37    -146.38    -143.96     486.33
  -3110    -728.97       3.92  -11358.89    -173.37    -150.93     436.33
  -3283    -752.10     -12.20   -9683.32    -158.25    -151.55    -753.67
  -2412     498.37      10.56  -11971.43    -101.01    -119.15    -916.63
  -2583     446.77       7.01  -14523.37    -176.70    -120.52    -277.24

Input image (as reference):

Issues:
There is an issue with the minus sign, when the sign touches the digit.
Example:

(In that case the minus sign is not identified).

Suggested solution:
Check if the background color is red or green, and if it's red, add a minus sign (if not exist).

Answered By - Rotem

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, June 22, 2022

[FIXED] how can I improve digit OCR accuracy with opencv and pytesseract

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels