Issue
I have to recognize handwritten letters and their coordinates such as on this image.
I tried to do this with pytesseract but It can recognize only printed text and works incorrect with my images. I have no time to write my own neural network and want to use a ready-made solution as pytesseract. I know that it can do this but this code works incorrectly.
import cv2
import pytesseract
import imutils
image = cv2.imread('test/task.jpg')
image = imutils.resize(image, width=700)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(image, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', image)
cv2.imwrite('images/thresh.png', thresh)
cv2.waitKey()
This code returns wrong answer.
ti | ee
ares” * ae
de le lc ld
What am I doing wrong ?
P.S. I converted my image using adaptive threshold and it is looking like this, but the code still working inccorect (now I just call image_to_string() method with well converted image)
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\USER\AppData\Local\Tesseract-OCR\tesseract.exe'
image = cv2.imread('output.png')
data = pytesseract.image_to_string(image, lang='eng', config='--psm 6')
print(data)
cv2.waitKey(0)
a Oe '
Pee ee
eee ee ee ee
re
eB
STI AT TTT
“Shen if
ae 6
jal ne
yo l
a) Ne es oe
Seaneaeer =
ee es ee
a en ee
ee rt
Solution
I have a suggestion for making the image clear, removing the background.
You can use inRange thresholding.
To use inRange thresholding you need to convert the image to the "hsv" color-space, then you have to set the lower and upper boundaries of the inRange
method. The boundary values can be set manually. The result of the inRange
method will be the mask of the image, where you can use the mask to remove the background. For example:
After, you can use the tesseract page segmentation modes(psm
). Each psm
value will give a different output. For example, psm
6 will give the result:
B
JN
A 3 C
If the answer is not the desired output, then you can use other improvement methods. Like other image processing methods, or other methods (EAST Text detector).
If you still have trouble, you can localize the detected text and observe why the text is misinterpreted. For example:
As we can see with psm
mode 6, B and C are misinterpreted. Maybe psm
7 will interpret them correctly, you need to try with other values. If you don't want to try, you can use other deep-learning method like EAST text detector
Code:
import cv2
from numpy import array
import pytesseract
from pytesseract import Output
# Load the image
img = cv2.imread("f4hjh.jpg")
# Convert to the hsv color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get binary-mask
msk = cv2.inRange(hsv, array([0, 0, 0]), array([179, 255, 80]))
krn = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dlt = cv2.dilate(msk, krn, iterations=1)
thr = 255 - cv2.bitwise_and(dlt, msk)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)
For detecting and localizing the text in the image:
# OCR
d = pytesseract.image_to_data(thr, config="--psm 6", output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Draw rectangle to the detected region
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 0, 255), 5)
# Crop the image
crp = thr[y:y+h, x:x+w]
# OCR
txt = pytesseract.image_to_string(crp, config="--psm 6")
print(txt)
# Display the cropped image
cv2.imshow("crp", crp)
cv2.waitKey(0)
Answered By - Ahx
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.