Monday, February 28, 2022

[FIXED] Improve Pytesseract reliability of reading text

February 28, 2022 image-processing, ocr, opencv, python, python-tesseract No comments

Issue

I'm trying to read relatively clear numbers from a screenshot, but I am running into issues getting pytesseract to read the text correctly. I have the following screenshot:

And I know the score (2-0) and the clock (1:42) are going to be in the exact same place.

This is the code I currently have for reading the clock time and the orange score:

lower_orange = np.array([0, 90, 200], dtype = "uint8")
upper_orange = np.array([70, 160, 255], dtype = "uint8")

    #Isolate scoreboard location on a 1080p pic
    clock = input[70:120, 920:1000]
    scoreboard = input[70:150, 800:1120]

    #greyscale
    roi_gray = cv2.cvtColor(clock, cv2.COLOR_BGR2GRAY)

    config = ("-l eng -c tessedit_char_whitelist=0123456789: --oem 1 --psm 8")
    time = pytesseract.image_to_string(roi_gray, config=config)
    print("time is " + time)

    # find the colors within the specified boundaries and apply
    # the mask
    mask_orange = cv2.inRange(scoreboard, lower_orange, upper_orange)

    # find contours in the thresholded image, then initialize the
    # list of digit locations
    cnts = cv2.findContours(mask_orange.copy(), cv2.RETR_EXTERNAL,
                            cv2.CHAIN_APPROX_SIMPLE)
    cnts = imutils.grab_contours(cnts)
    locs = []

    for (i, c) in enumerate(cnts):
        # compute the bounding box of the contour, then use the
        # bounding box coordinates to derive the aspect ratio
        (x, y, w, h) = cv2.boundingRect(c)
        ar = w / float(h)

        # since score will be a fixed size of about 25 x 35, we'll set the area at about 300 to be safe
        if w*h > 300:
            orange_score_img = mask_orange[y-5:y+h+5, x-5:x+w+5]
            orange_score_img = cv2.GaussianBlur(orange_score_img, (5, 5), 0)

            config = ("-l eng -c tessedit_char_whitelist=012345 --oem 1 --psm 10")
            orange_score = pytesseract.image_to_string(orange_score_img, config=config)
            print("orange_score is " + orange_score)

here's the output:

time is 1:42
orange_score is

Here is the orange_score_img, after I masked out everything within my upper and lower orange bounds and applied a gaussian blur.

Yet at this point, and even when I configure pytesseract to search for 1 character and limited the whitelist, I still can't get it to read correctly. Is there some additional postprocessing that I'm missing to help pytesseract read this number as 2?

Solution

As per @fmw42's suggestion, I tried playing with some morphology changes. Thickening the numbers seemed to do the trick!

kernel = np.ones((5,5),np.uint8) orange_score_img = cv2.dilate(orange_score_img,kernel,iterations=1)

EDIT: the REAL answer, I realized, is that pytesseract does MUCH better with black text on a white background than white text on a black background! it reads perfectly when I inverted the colors:

orange_score_img = cv2.bitwise_not(orange_score_img)

I hope this helps people when they first start out using pytesseract! trying to tune the image to fit all my cases was incredibly frustrating and knowing that black text on white works much better would have saved me hours...

Answered By - JonathanW

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, February 28, 2022

[FIXED] Improve Pytesseract reliability of reading text

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels