Issue
I have various type of images like those:
As you see, they are all kinda similar, however I do not manage to properly extract the number on them.
So far my code consists in the following:
lower = np.array([250,200,90], dtype="uint8")
upper = np.array([255,204,99], dtype="uint8")
mask = cv2.inRange(img, lower, upper)
res = cv2.bitwise_and(img, img, mask=mask)
data = image_to_string(res, lang="eng", config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
numbers = int(''.join(re.findall(r'\d+', data)))
I tried twearking the psm parameter 6,8 and 13 they all work for some of those examples, but none on all, and I have no idea how I could circumvent my problem.
Another solution proposed is:
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*2, h*2))
erd = cv2.erode(gry, None, iterations=1)
thr = cv2.threshold(erd, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
bnt = cv2.bitwise_not(thr)
However, on the first picture, bnt gives:
And then pytesseract sees 460..
Any idea please?
Solution
My approach:
-
- Upsample
Upsampling is required for accurate recognition. Resizing two-times will make the image readable.
Erosion operation is a morphological operation helps to remove the boundary of the pixels. Erosion remove the strokes on the digit, make it easier to detect.
Thresholding (Binary and Inverse Binary) helps to reveal the features.
Bitwise-not is an arithmetic operation highly useful for extracting part of the image.
You can learn more methods simple reading from Improving the quality of the output
Update
The first image is easy to read, since it is not requiring any pre-processing technique. Please read How to Improve Quality of Tesseract
1460
720
3250
3146
2681
1470
Code:
import cv2
import pytesseract
img_lst = ["oqWjd.png", "YZDt1.png", "MUShJ.png", "kbK4m.png", "POIK2.png", "4W3R4.png"]
for i, img_nm in enumerate(img_lst):
img = cv2.imread(img_nm)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
if i == 0:
thr = gry
else:
gry = cv2.resize(gry, (w * 2, h * 2))
erd = cv2.erode(gry, None, iterations=1)
if i == len(img_lst)-1:
thr = cv2.threshold(erd, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
else:
thr = cv2.threshold(erd, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
bnt = cv2.bitwise_not(thr)
txt = pytesseract.image_to_string(bnt, config="--psm 6 digits")
print("".join([t for t in txt if t.isalnum()]))
cv2.imshow("bnt", bnt)
cv2.waitKey(0)
If you want to display comma in the result, change print("".join([t for t in txt if t.isalnum()]))
line to print(txt)
.
Not that on the fourth image the threshold method changed from binary to inverse-binary. Binary thresholding is not working accurately on all images. Therefore you need to change.
Answered By - Ahx
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.