Issue
I have installed the pytesseract module in my venv and want to extract text from a german file
with executingthis script from pytesseract and setting the lenguage to german
import cv2
import pytesseract
try:
from PIL import Image
except ImportError:
import Image
print(pytesseract.image_to_string(Image.open('test.jpg')))
print(pytesseract.image_to_string(Image.open('test.jpg'), lang='ger'))
which gives me
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
Error opening data file C:\\Program Files (x86)\\Tesseract-OCR/tessdata/ger.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language \'ger\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
I have found the lenguage data on [tessdoc/Data-Files] (https://github.com/tesseract-ocr/tessdoc/blob/master/Data-Files.md)
so far I only found an guide for linux How do I install a new language pack for Tesseract on 16.04
where to I need to move the lenguage files in my pyteseract sidepackage to get the script working ?
Solution
found a guide to do this on a german site Python Texterkennung: Bild zu Text mit PyTesseract in Windows
Answered By - Sator
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.