Issue
I am trying to extact text from an image using the pytesseract module in Python but I keep getting an error when I execute my code below. There is a similar question that someone provided with this answer https://stackoverflow.com/a/54914105/12642523 ..... but I still get the same error. Any tips?
import pytesseract as py
from PIL import Image
cmd = py.pytesseract.tesseract_cmd =r'C:\Users\mortiz\AppData\Local\Programs\Python\Python37-32\Scripts\pytesseract.exe'
img=r"C:\Python\Images to text\databases.jpg"
py.image_to_string(img)
---------------------------------------------------------------------------
TesseractError Traceback (most recent call last)
<ipython-input-86-5e06d7c425c6> in <module>
3 cmd = py.pytesseract.tesseract_cmd =r'C:\Users\mortiz\AppData\Local\Programs\Python\Python37-32\Scripts\pytesseract.exe'
4 img=r"C:\Python\Images to text\databases.jpg"
----> 5 py.image_to_string(img)
c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
346 Output.DICT: lambda: {'text': run_and_get_output(*args)},
347 Output.STRING: lambda: run_and_get_output(*args),
--> 348 }[output_type]()
349
350
c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
345 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
346 Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 347 Output.STRING: lambda: run_and_get_output(*args),
348 }[output_type]()
349
c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
256 }
257
--> 258 run_tesseract(**kwargs)
259 filename = kwargs['output_filename_base'] + extsep + extension
260 with open(filename, 'rb') as output_file:
c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
232 with timeout_manager(proc, timeout) as error_string:
233 if proc.returncode:
--> 234 raise TesseractError(proc.returncode, get_errors(error_string))
235
236
TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')
Solution
You are passing the string as image, not image. You have to change the tesseract call as:
img=r"C:\Python\Images to text\databases.jpg"
py.image_to_string(Image.open(img))
Alternately, You can use opencv to open the image. Works fine.
You can pip install opencv using.
pip install opencv-python
Once you have installed, you can read an image by
import cv2
import pytesseract
image=cv2.imread('path/to/image.jpg')
string=pytesseract.image_to_string(image)
Answered By - Sreekiran A R
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.