Issue
I am looking for an OCR implementation, preferrably in Python that would be able to extract text from a scanned pdf (printed machine written text). However due to a company policy and security reason I am not able to download any executable files (.exe), therefore any Python libraries building upon Tesseract currently don't work for me... Did anybody else also encounter this problem? (I guess its pretty common in big companies). I would be looking for a work-around, either a way to build tesseract without downloading a .exe file or an alternative OCR implementation.
Thanks already! I am working on a Windows 7 machine..
Solution
Unfortunately Pytesseract is only a wrapper around a Tesseract binary (.exe on Windows), so you will probably have to beg and plead your IT to allow it. An option might be to build Tesseract from source yourself, so then you haven't downloaded a "random" .exe...
Another option is, of course, to use an online OCR API, but if security's that tight (and I suppose budgets are too), that might not work for you either.
Answered By - AKX
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.