Issue
I'm trying to read this number using pytesseract: and when I do it prints out IL
:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
text = pytesseract.image_to_string(Image.open("Number.jpg"))
print(text)
I've also tried converting the image to black or white: but this hasn't worked either. What am I doing wrong?
Solution
pytesseract
works best and gives accurate output with black text on white background. Preprocessing is the main part to get accurate results. But in your case a simple inverse binary thresholding is more than enough to get the correct output as your image does not contain any noise at all. Adaptive thresholding should be used only in case of uneven lighting.
>>> image = cv2.imread("14.jpg",0)
>>> thresh = cv2.threshold(image,0,255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
>>> data = pytesseract.image_to_string(thresh,config= '--psm 6 digits')
>>> data
'14'
I think tesseract's version does not cause any problem.
Tesseract version tesseract v5.0.0-alpha.20200223
pytesseract version pytesseract Version: 0.3.4
Answered By - Tarun Chakitha
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.