Issue
I have very simple python code:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('1.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
hImg,wImg,_ = img.shape
#detecting words
boxes = pytesseract.image_to_data(img)
for x,b in enumerate(boxes.splitlines()):
if x!=0:
b = b.split()
if len(b) == 12:
x,y,w,h = int(b[6]), int(b[7]), int(b[8]), int(b[9])
cv2.rectangle(img, (x,y), (w+x,h+y), (0,0,255), 3)
cv2.imshow('result', img)
cv2.waitKey(0)
But result was interesting. It detected only 4 words. what could it be the reason?
Solution
You'll have better OCR results if you improve the quality
of the image you are giving Tesseract.
While tesseract version 3.05 (and older) handle inverted image (dark background and light text) without problem, for 4.x version use dark text on light background.
Convert from BGR
to HLS
to later remove background colors from the numbers in the top half of the image. Then, create a "blue" mask with cv2.inRange
and replace anything that's not "blue" with the color white.
hls=cv2.cvtColor(img,cv2.COLOR_BGR2HLS)
# Define lower and upper limits for the number colors.
blue_lo=np.array([114, 70, 70])
blue_hi=np.array([154, 225, 225])
# Mask image to only select "blue"
mask=cv2.inRange(hls,blue_lo,blue_hi)
# copy original image
img1 = img.copy()
img1[mask==0]=(255,255,255)
Help pytesseract by converting the image to black and white
This is converting an image to black and white. Tesseract does this internally (Otsu algorithm), but the result can be suboptimal, particularly if the page background is of uneven darkness.
rgb = cv2.cvtColor(img1, cv2.COLOR_HLS2RGB)
gray = cv2.cvtColor(rgb, cv2.COLOR_RGB2GRAY)
_, img1 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
cv2.imshow('img_to_binary',img1)
Use image_to_data
over the previously created img1
and continue applying your existing code.
...
hImg,wImg,_ = img.shape
#detecting words
boxes = pytesseract.image_to_data(img1)
for x,b in enumerate(boxes.splitlines()):
...
...
Answered By - n1colas.m
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.