Issue
I'm fairly new on OpenCv and tesseract. I'm recently building a project on using computer vision to detect door labels. Hopefully it would be beneficial for visually impaired group.
The idea of the program is to preprocess the input image by converting it into binary color, then use canny edge to detect the outlines of door label, then dilate the canny edge result. After these, feed image to tesseract while trying to show the text detected with boxes.
Expected results are green rectangles on text. While printing out the text itself.
The issue is the missing rectangles and failure in text detection.
I have tried going through these:
- Recognize Text in images using Canny Edge detection in Opencv
- OpenCv pytesseract for OCR
- Image preprocessing with OpenCV before doing character recognition (tesseract)
The questions and solutions are either too simple or not as relevant. Some are not in python as well.
Attached below is my attempt on the code:
import pytesseract as pytess
import cv2 as cv
import numpy as np
from PIL import Image
from pytesseract import Output
img = cv.imread(r"C:\Users\User\Desktop\dataset\p\Image_31.jpg", 0)
# edges store the canny version of img
edges = cv.Canny(img, 100, 200)
# ker as in kernel
# (5, 5) is the matrix while uint8 is datatype
ker = np.ones((3, 3), np.uint8)
# dil as in dilation
# edges as the src, ker is the kernel we set above, number of dilation
dil = cv.dilate(edges, ker, iterations=1)
# setup pytesseract parameters
configs = r'--oem 3 --psm 6'
# feed image to tesseract
result = pytess.image_to_data(dil, output_type=Output.DICT, config=configs, lang='eng')
print(result.keys())
boxes = len(result['text'])
# make a new copy of edges
new_item = dil.copy()
for sequence_number in range(boxes):
if int(result['conf'][sequence_number]) > 30: # removed constraints
(x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
result['width'][sequence_number], result['height'][sequence_number])
new_item = cv.rectangle(new_item, (x, y), (x + w, y + h), (0, 255, 0), 2)
# detect sentence with tesseract
# pending as rectangle not achieved
cv.imshow("original", img)
cv.imshow("canny", edges)
cv.imshow("dilation", dil)
cv.imshow("capturedText", new_item)
#ignore below this line, it is only for testing
#testobj = Image.fromarray(dil)
#testtext = pytess.image_to_string(testobj, lang='eng')
#print(testtext)
cv.waitKey(0)
cv.destroyAllWindows()
The testing part of the code return results as shown below:
a)
Meets
Which, obviously does not satisfy the objective.
EDIT
After posting the question, I realized I may have done it wrong in the beginning. I should attempt to use OpencV to detect the contour of the door label and isolate the part containing text before sending whatever is in the rectangle for OCR recognition.
EDIT2
Now that I identify the issue thanks to our stackoverflow members, now I'm attempting to add on image rectification/image wrapping technique to retrieve a straight front view to get a better accuracy for the system. Update soon.
EDIT3
After certain bug fixing, reducing the constraint while allowing the function to draw on the original image, I have achieved the results below. Attached the updated code as well.
import cv2 as cv
import numpy as np
import pytesseract as pytess
from pytesseract import Output
# input of img source
img = cv.imread(r"C:\Users\User\Desktop\dataset\p\Image_31.jpg")
# necessary image color conversion
img2 = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
# edges store the canny version of img
edges = cv.Canny(img2, 100, 200)
# ker as in kernel
# (5, 5) is the matrix while uint8 is datatype
ker = np.ones((3, 3), np.uint8)
# dil as in dilation
# edges as the src, ker is the kernel we set above, number of dilation
dil = cv.dilate(edges, ker, iterations=1)
# setup pytesseract parameters
configs = r'--oem 3 --psm 6'
# feed image to tesseract
result = pytess.image_to_data(dil, output_type=Output.DICT, config=configs, lang='eng')
# number of boxes that encapsulate the boxes
boxes = len(result['text'])
# make a new copy of edges
new_item = dil.copy()
for sequence_number in range(boxes):
if int(result['conf'][sequence_number]) > 0: #removed constraints
(x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
result['width'][sequence_number], result['height'][sequence_number])
# draw rectangle boxes on the original img
cv.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 3)
# Crop the image
crp = new_item[y:y + h, x:x + w]
# OCR
txt = pytess.image_to_string(crp, config=configs)
# returns recognised text
print(txt)
cv.imshow("capturedText", crp)
cv.waitKey(0)
# cv.imshow("original", img)
# cv.imshow("canny", edges)
# cv.imshow("dilation", dil)
cv.imshow("results", img)
cv.waitKey(0)
cv.destroyAllWindows()
Solution
You have found all the detected text in the image:
for sequence_number in range(boxes):
if int(result['conf'][sequence_number]) > 30:
(x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
result['width'][sequence_number], result['height'][sequence_number])
new_item = cv.rectangle(new_item, (x, y), (x + w, y + h), (0, 255, 0), 2)
But you also say the current confidence should be more than 70%.
- If we remove the constraint
- If we OCR each new item
Result will be:
Now if you read:
txt = pytesseract.image_to_string(new_item, config="--psm 6")
print(txt)
OCR will be:
Meeting Room §
The output of the current pytesseract version 0.3.7
Code:
# Load the libraries
import cv2
import pytesseract
# Load the image
img = cv2.imread("fsUSw.png")
# Convert it to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# OCR detection
d = pytesseract.image_to_data(gry, config="--psm 6", output_type=pytesseract.Output.DICT)
# Get ROI part from the detection
n_boxes = len(d['level'])
# For each detected part
for i in range(1, 2):
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Draw rectangle to the detected region
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)
# Crop the image
crp = gry[y:y + h, x:x + w]
# OCR
txt = pytesseract.image_to_string(crp, config="--psm 6")
print(txt)
# Display the cropped image
cv2.imshow("crp", crp)
cv2.waitKey(0)
# Display
cv2.imshow("img", img)
cv2.waitKey(0)
Answered By - Ahx
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.