Issue
I am trying to use tesseract-OCR to print text from the image. But I am getting the above error. I have installed tesseract OCR using https://github.com/UB-Mannheim/tesseract/wiki and pytesseract in the anaconda prompt using pip install pytesseract but its not working. Please help if anyone has faced the similar issue.
Collecting pytesseract
Downloading https://files.pythonhosted.org/packages/13/56/befaafbabb36c03e4fdbb3fea854e0aea294039308a93daf6876bf7a8d6b/pytesseract-0.2.4.tar.gz (169kB)
100% |████████████████████████████████| 174kB 288kB/s
Requirement already satisfied: Pillow in c:\users\500066016\appdata\local\continuum\anaconda3\lib\site-packages (from pytesseract) (5.1.0)
Building wheels for collected packages: pytesseract
Running setup.py bdist_wheel for pytesseract ... done
Stored in directory: C:\Users\500066016\AppData\Local\pip\Cache\wheels\a8\0c\00\32e4957a46128bea34fda60b8b01a8755986415cbab3ed8e38
Successfully built pytesseract
Below is the code:
import pytesseract
import cv2
import numpy as np
def get_string(img_path):
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((1,1), np.uint8)
dilate = cv2.dilate(img, kernel, iterations=1)
erosion = cv2.erode(img, kernel, iterations=1)
cv2.imwrite('removed_noise.jpg', img)
img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
cv2.imwrite('thresh.jpg', img)
res = pytesseract.image_to_string('thesh.jpg')
return res
print('Getting string from the image')
print(get_string('quotes.jpg'))
Below is the error:
Traceback (most recent call last):
File "<ipython-input-2-cf6e0fca14b4>", line 1, in <module>
runfile('C:/Users/500066016/.spyder-py3/project1.py', wdir='C:/Users/500066016/.spyder-py3')
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/500066016/.spyder-py3/project1.py", line 23, in <module>
print(get_string('quotes.jpg'))
File "C:/Users/500066016/.spyder-py3/project1.py", line 20, in get_string
res = pytesseract.image_to_string('thesh.jpg')
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 294, in image_to_string
return run_and_get_output(*args)
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 202, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 172, in run_tesseract
raise TesseractNotFoundError()
TesseractNotFoundError: tesseract is not installed or it's not in your path```
Solution
Step 1: Download and install Tesseract OCR from this link.
Step 2: After installing find the "Tesseract-OCR" folder, double Click on this folder and find the tesseract.exe.
Step 3: After finding the tesseract.exe, copy the file location.
Step 4: Pass this location into your code like this
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
Note: C:\Program Files\Tesseract-OCR\tesseract.exe == your copied location
Answered By - Mohacel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.