Issue
I'd like to know why this symbol appears in the output and how I can remove it.
All images I use have the same behavior.
I can't get rid it.
I need the value extracted from the image without that symbol because I'll use it later in another place.
script.py
import pytesseract as ocr
from PIL import Image
custom_config = r'--psm 3'
phrase = ocr.image_to_string(Image.open('image.jpg'), config=custom_config)
print(phrase)
Using pytesseract
Using tesseract
image.jpg
Solution
Those are form feed (FF, \u000C) characters, used by Tesseract to delimit pages of OCRed text. You can trim the output string before printing to the console.
Answered By - nguyenq
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.