Issue
I'm sorry for the title of my question if it doesn't let clear my problem.
I'm trying to get information from an image of a document using tesseract, but it doesn't work well on pictures (on print screens of text it works very well). I want to ask if somebody know a technique that can help me. I think that letting the image black and white, where the information I want is in black would help a lot, but I don't know how to do that.
I will be glad if somebody knows how to help me. (:
Solution
Using opencv might help to preprocess the image before passing it to tesseract.
I usually follow these steps
- Convert the image to grayscale
- If the texts in the image are small, resize the image using cv2.resize()
- Blur the image (GaussianBlur or MedianBlur)
- Apply threshhold to make the text prominent (cv2.threshold)
- Use tesseract config to instruct tesseract to look for specific characters. For example If the image contains only alphanumeric upper case english text then passing config='-c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" would help.
Answered By - saroj panda
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.