Issue
I am trying to apply OCR using OpenCV and Python-tesseract to convert the following image to text: Original image.
But tesseract has not managed to correctly read the image as of yet. It reads:uleswylly Bie7 Srp a7 instead.
I have taken the following steps to pre-process the image before I feed it to tesseract:
- First I upscale the image:
# Image scaling
def set_image_dpi(img):
# Get current dimensions of the image
height, width = img.shape[:2]
# Define scale factor
scale_factor = 6
# Calculate new dimensions
new_height = int(height * scale_factor)
new_width = int(width * scale_factor)
# Resize image
return cv2.resize(img, (new_width, new_height))
Image result: result1.png
- Normalize the image:
# Normalization
norm_img = np.zeros((img.shape[0], img.shape[1]))
img = cv2.normalize(img, norm_img, 0, 255, cv2.NORM_MINMAX)
Image result: result2.png
- Then I remove some noise:
# Remove noise
def remove_noise(img):
return cv2.fastNlMeansDenoisingColored(img, None, 10, 10, 7, 15)
Image result: result3.png
- Get the grayscale image:
# Get grayscale
def get_grayscale(img):
return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Image result: result4.png
- Apply thresholding:
# Thresholding
def thresholding(img):
return cv2.threshold(img, 150, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) [1]
Image result: result5.png
- Invert the image color:
# Invert the image
def invert(img):
return cv2.bitwise_not(img)
Image result: result6.png
- Finally I pass the image to pytesseract:
# Pass preprocessed image to pytesseract
text = pytesseract.image_to_string(img)
print("Text found: " + text)
pytesseract output: "uleswylly Bie7 Srp a7"
I would like to improve my pre-processing so that pytesseract can actually read the image? Any help would be greatly appreciated!
Thanks in advance,
Steenert
Solution
The problem is a bit challenging, without overfitting the solution to the problem...
Let assume that the text is bright, colorless and surrounded by colored pixels. We may also assume that the background is relatively homogenous.
We may start with result3.png
and use the following stages:
- Add padding with the color of the top left pixel.
The padding is used as preparation forfloodFill
(required because some colored pixel touches the image margins). - Fill the background with light blue color.
Note that the selected color is a bit of an overfitting, because the saturation level needs to be close to the level of the red pixels. - Convert from BGR to HSV color space, and extract the saturation channel.
- Apply thresholding (use
cv2.THRESH_OTSU
for automatic thresholding). - Apply
pytesseract.image_to_string
to the thresholded image.
Code sample:
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # May be required when using Windows
img = cv2.imread('result3.png') # Read result3.png
# Add padding with the color of the top left pixel
pad_color = img[0, 0, :]
padded_img = np.full((img.shape[0]+10, img.shape[1]+10, 3), pad_color, np.uint8)
padded_img[5:-5, 5:-5, :] = img
cv2.floodFill(padded_img, None, (0, 0), (255, 100, 100), loDiff=(10, 10, 10), upDiff=(10, 10, 10)) # Fill the background with blue color.
cv2.imwrite('result7.png', padded_img)
# Convert from BGR to HSV color space, and extract the saturation channel.
hsv = cv2.cvtColor(padded_img, cv2.COLOR_BGR2HSV)
s = hsv[:, :, 1]
cv2.imwrite('result8.png', s)
# Apply thresholding (use `cv2.THRESH_OTSU` for automatic thresholding)
thresh = cv2.threshold(s, 0, 255, cv2.THRESH_OTSU)[1]
cv2.imwrite('result9.png', thresh)
# Pass preprocessed image to PyTesseract
text = pytesseract.image_to_string(thresh, config="--psm 6")
print("Text found: " + text)
Output:
Text found: Jules -Lv: 175 -P.17
result7.png
(after floodFill):
result8.png
(after extracting the saturation channel):
result9.png
(after thresholding):
Answered By - Rotem
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.