Issue
I'm trying to recognize Captcha to Text.
This captcha is not very difficult. (as I think).
I open the image and convert it with OpenCV, to make it easy to recognize.
I will show you an example. Example Captcha
After OpenCV Catpcha
image = cv2.imread(filename)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
cv2.imwrite('OPENCV.png', gray)
# Get Text From Image
pytesseract.image_to_string(Image.open('OPENCV.png'), lang='eng', config="-c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ --psm 8")
It's simple. But result is 'PLLY2', But I want 'PLLVI2' OR 'PLLV12'.
Is there any option or another way that I can use to get more accuracy?
I use one word option that 'psm 8'. I had tried to find to make tesseract find fixed number of characters, but it is impossible.
I will really appreciate it if you give me just a hint. Thank you very much for reading this question.
Solution
You could slice the image to make each letter and use --psm 10:
image = cv2.imread(filename)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
gray1 = gray[:, :25]
gray2 = gray[:, 25:50]
gray3 = gray[:, 50:75]
gray4 = gray[:, 75:100]
gray5 = gray[:, 100:125]
gray6 = gray[:, 125:]
print(''.join([pytesseract.image_to_string(i, config='--psm 10 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ') for i in [gray1, gray2, gray3, gray4, gray5, gray6]])
Answered By - Elias Jacob
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.