Issue
import pytesseract
from pdf2image import convert_from_path, convert_from_bytes
import cv2,numpy
def pil_to_cv2(image):
open_cv_image = numpy.array(image)
return open_cv_image[:, :, ::-1].copy()
path='OriginalsFile.pdf'
images = convert_from_path(path)
cv_h=[pil_to_cv2(i) for i in images]
img_header = cv_h[0][:160,:]
#print(pytesseract.image_to_string(Image.open('test.png'))) I only found this in tesseract docs
Hello, is there a way to read the img_header
directly using pytesseract without saving it,
Solution
pytesseract.image_to_string() input format
As documentation explains pytesseract.image_to_string()
needs a PIL image as input.
So you can convert your CV image into PIL one easily, like this:
from PIL import Image
... (your code)
print(pytesseract.image_to_string(Image.fromarray(img_header)))
if you really don't want to use PIL!
see: https://github.com/madmaze/pytesseract/blob/master/src/pytesseract.py
pytesseract is an easy wrapper to run the tesseract command def run_and_get_output()
line, you'll see that it saves your image into an temporary file, and then gives the address to the tesseract to run.
hence, you can do the same with opencv, just rewrite the pytesseract only .py
file to do it with opencv, although; i don't see any performance improvements whatsoever.
Answered By - a-sam
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.