Issue
I am trying to extract some really obvious text from an image that's contained in a wider box:
However, Tesseract is not successful extracting the text from it. If I remove the box in the image, it works just fine:
Note, that when I change the font to something more common (e.g. Arial), it will work fine for both images. But, I do need to make it work with the current font (Impact).
Any help on how to get that to work would be hugely appreciated!
Below is my current code:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('without_box.png') #https://i.stack.imgur.com/vrJvd.png
img_text = pytesseract.image_to_string(img)
print('without_box : ', img_text) #returns "without_box : TEXT"
img = cv2.imread('with_box.png') #https://i.stack.imgur.com/xNEdR.png
img_text = pytesseract.image_to_string(img)
print('with_box : ', img_text) #returns "with_box : "
Solution
For the presented kind of images1, you could automatically crop the white part which holds the text, and run pytesseract
:
import cv2
import pytesseract
def crop_and_detect(image):
thr = cv2.threshold(image, 128, 255, cv2.THRESH_BINARY)[1]
x, y, w, h = cv2.boundingRect(thr)
return pytesseract.image_to_string(image[y:y+h, x:x+w])
for img_file in ['vrJvd.png', 'xNEdR.png']:
img = cv2.imread(img_file, cv2.IMREAD_GRAYSCALE)
print(img_file, crop_and_detect(img).replace('\f', ''))
# vrJvd.png TEXT
#
# xNEdR.png TEXT
#
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.2
OpenCV: 4.5.2
pytesseract: 5.0.0-alpha.20201127
----------------------------------------
1 If you have an image processing related question, provide a representative set of possible input images. Otherwise, you might get a proper solution for the one or two input images you provided, but while testing that solution on your actual data set, you find out "it doesn't work", and possibly post (a lot of) follow-up questions, which could've prevented in the first place.
Answered By - HansHirse
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.