Issue
I tried to follow different steps by researching but neither of the steps are helping in executing the pytesseract code.
Downloaded tesseract exe from https://github.com/UB-Mannheim/tesseract/wiki.
Installed this exe in C:\Program Files\Tesseract-OCR
installed pytesseract
using pip
imported pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
a = pytesseract.image_to_string(PIL.Image.open('/content/drive/MyDrive/hindi_image.jpg'),lang='hin')
but this steps throw error
FileNotFoundError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pytesseract/pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
253 try:
--> 254 proc = subprocess.Popen(cmd_args, **subprocess_args())
255 except OSError as e:
6 frames
FileNotFoundError: [Errno 2] No such file or directory: 'tesseract': 'tesseract'
During handling of the above exception, another exception occurred:
TesseractNotFoundError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pytesseract/pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
256 if e.errno != ENOENT:
257 raise e
--> 258 raise TesseractNotFoundError()
259
260 with timeout_manager(proc, timeout) as error_string:
TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
In my local system the path is same as above
How can I resolve this please help. Thankyou!
Solution
Google Collab
runs on server
with Linux
so you can't use path to your local
Windows
.
You have to install tesseract
for Linux on server using
!apt install tesseract-ocr
and use path to this version.
But maybe you will not have use this path in code because apt
should install tesseract
in folder which is on environment variable PATH
and code should find tesseract
without path.
I run pytesseract
on local Linux and I don't have to set path in code.
If you will need to use language different than English then you can see all available languages
!apt search tesseract
and install like (ie. Hindi)
!apt install tesseract-ocr-hin
It may need also to add option lang='hin'
in pytesseract
to use this language.
To use both languages you can try lang='hin+eng'
EDIT:
I tested on Google Colab
- after installing !apt install tesseract-ocr
I can use pytesseract
without setting path.
EDIT:
pytesseract
writes image to file and runs tesseract
with path to this file and it writes result in text file, and later pytesseract
reads result from text file.
But you can send directly path. And later read result from text file.
import pytesseract
pytesseract.pytesseract.run_tesseract('path/to/image.png', 'output', 'txt', lang='hin')
with open('output.txt') as fh:
result = fh.read()
print(result)
or even
import pytesseract
def file_to_text(filename, *args, **kwargs):
pytesseract.pytesseract.run_tesseract(filename, 'output', 'txt', *args, **kwargs)
with open('output.txt') as fh:
return fh.read()
# ---
text = file_to_text('path/to/image.png', lang='hin')
print(text.strip())
Answered By - furas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.