Issue
I am currently facing a problem with pytesseract where the software is unable to detect a number in this image:
https://i.stack.imgur.com/kmH2R.png
This is taken from a bigger image with threshold filter applied.
For some reason, pytesseract doesn't want to recognise the 6 in this image. Any suggestions? Here is my code:
image = #Insert raw image here. My code takes a screenshot.
image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
image = cv2.medianBlur(image, 3)
rel, gray = cv2.threshold(image, 127, 255, cv2.THRESH_BINARY)
# If you want to use the image from above, start here.
image = Image.fromarray(image)
string = pytesseract.image_to_string(image)
print(string)
EDIT: With some further investigation, my code works fine wit numbers containing 2 digits. But not those with singular digits.
Solution
pytesseract defaults to a mode that looks for large chunks of text (PSM_SINGLE_BLOCK or --psm 6), in order to have it detect a single character you need to run it with the option --psm 10 (PSM_SINGLE_CHAR). However, due to the black spots in the corners of the image you provided it detects them as random dashes and returns nothing in this mode since it things there's multiple characters, so in this case you need to use --psm 8 (PSM_SINGLE_WORD):
string = pytesseract.image_to_string(image, config='--psm 8')
The output from this will include those random characters so you would need to strip them after pytesseract runs or improve your bounding box around the numbers to remove any noise. Also, if all of your characters being detected are numbers you can add '-c tessedit_char_whitelist=0123456789' after '--psm 8' to improve the detection.
Some other minor tips to simplify your code is that cv2.imread has an option to read the image as black & white so you don't need to run cvtColor afterwards, just do:
image = cv2.imread('/path/to/image/6.png', 0)
also you can create the PIL image object within your call to pytesseract, so that line simplifies to:
string = pytesseract.image_to_string(Image.fromarray(img), config='--psm 8')
as long as you have 'from PIL import Image' at the top of your script.
Answered By - Josh Reid
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.