Issue
The problem here is I need to remove the lines and write code to recognize the characters. Till now I have seen solutions, where the char was in solid, but this has char with double border.
Solution
For this specific captcha, there's quite a simple solution. But, there's no guarantee for this approach to work on other, even very similar captchas – due to the "nature" of captchas as already mentioned in the comments, and in general when dealing with image-processing tasks with limited provided input data.
Read the image as grayscale.
Threshold the image at nearly white cutoff.
Flood fill the "background" with black.
Run
pytesseract
with-psm 6
option.
That'd be the whole code:
import cv2
import pytesseract
# Read image as grayscale
img = cv2.imread('FuZEJ.png', cv2.IMREAD_GRAYSCALE)
# Threshold at nearly white cutoff
thr = cv2.threshold(img, 224, 255, cv2.THRESH_BINARY)[1]
# Floodfill "background" with black
ff = cv2.floodFill(thr, None, (0, 0), 0)[1]
# OCR using pytesseract
text = pytesseract.image_to_string(ff, config='--psm 6').replace('\n', '').replace('\f', '')
print(text)
# xwphs
Caveat: I use a special version of Tesseract from the Mannheim University Library.
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
PyCharm: 2021.1.1
OpenCV: 4.5.1
pytesseract: 5.0.0-alpha.20201127
----------------------------------------
Answered By - HansHirse
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.