Issue
I've got this picture (preprocessed image) from which I want to extract the numeric values of each line. I'm using pytesseract but it doesnt show any results for this image.
I've tried several config options from other questions like "--psm 13 --oem 3"
or whitelisting numbers but nothing yields results.
As a result I usually get just one or two characters or ~5 dots/dashes but nothing even remotly resembling the size of my input.
I hope someone can help me cheers in advance for your time.
pytesseract version: 0.3.8 tesseract version: 5.0.0-alpha.20210506
Solution
You must think to use --psm 4
, it's more appropriate for your image. I also recommend to rethink about the image pre-process. Tesseract is not perfect and it requires good image as input to work well.
import cv2 as cv
import pytesseract as tsr
img = cv.imread('41DAx.jpg')
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
config = '--psm 4 -c tessedit_char_whitelist=0123456789,'
text = tsr.image_to_string(img, config=config)
print(text)
The above code was not able to well detect all digts in the image, but almost of them. Maybe with a bit of image pre-processing, you can reach your objective.
Answered By - Igor Melo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.