Issue
I'm learning AI/ML and trying to get text from this sample form.
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\Pranav\AppData\Local\Programs\Tesseract-OCR\tesseract.exe'
image = cv2.imread('image2.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
x,y,w,h = 393, 531, 837, 80
firstROI = blur[y:y+h,x:x+w]
firstname = pytesseract.image_to_string(firstROI, lang='eng', config='--psm 6')
print(firstname)
firstname = re.sub(r'[^\w]', '', firstname)
cv2.imshow('image', firstROI)
cv2.waitKey()
cv2.destroyAllWindows()
Using the above code, I can able to get text the normal printed text in the white background but unable to get the text from the grey background boxes. For example, first name box real value is "Andrew" but I m getting as "oe" only.
As per Freddy's comments, I go through this link and updated the following code but still no output.
from tesserocr import PyTessBaseAPI, PSM, OEM
api = PyTessBaseAPI(psm=PSM.AUTO_OSD, lang='eng', path=r'C:\Users\Pranav\tessdata-master')
images = ['andrew1.png', 'andrew2.png', 'test1.png']
for img in images:
api.SetImageFile(img)
print (api.GetUTF8Text())
print (api.AllWordConfidences())
It can read the text output from the third image only(Demographics). Please help me how to read the text from gray background images(Andrew).
Solution
This link provides me the answer. Its removing the noise in the background image.
Answered By - Arun
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.