Issue
I installed pytesseract via pip and its result is terrible.
As I searched for it, I think I need to give it more data but I can't find where to put tessedata(traineddata) since there is no directory like ProgramFile\Tesseract-OCR using Mac.
There is no problem with images' resolution, font or size. Image whose result is 'ecient Sh Abu'
Because large and clear test images work fine, I think it is a problem about lack of data. But any other possible solution is welcomed as long as it can read text with Python.
Please help me..
Solution
I installed pytesseract via pip and its result is terrible.
Sometimes you need to apply preprocessing to the input image to get accurate results.
Because large and clear test images work fine, I think it is a problem about lack of data. But any other possible solution is welcomed as long as it can read text with Python.
You could say lack of data is a problem. I think you'll find morphological-transformations useful.
For instance if we apply close
operation, the result will be:
The image looks similar to the original posted image. However there are slight changes in the output images (i.e. Grammar word is slightly different from the original image)
Now if we read the output image:
English
Grammar Practice
ter K-SAT (1-10)
Code:
import cv2
from pytesseract import image_to_string
img = cv2.imread("6Celp.jpg")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
opn = cv2.morphologyEx(gry, cv2.MORPH_OPEN, None)
txt = image_to_string(opn)
txt = txt.split("\n")
for i in txt:
i = i.strip()
if i != '' and len(i) > 3:
print(i)
Answered By - Ahx
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.