Issue
I am doing some transformations to capture text from image using tesseract
OCR, but, doing so, my text after applying some threshold effect is blurry, so I need some assistance here, a little help.
This is my code:
import cv2
import pytesseract as pyt
import numpy as np
pyt.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('vacunacion.jpg')
gris = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gris, 90, 255, cv2.THRESH_BINARY_INV)[1]
imagen_detalle = pyt.image_to_string(thresh, lang='eng',config='--psm 6')
print(imagen_detalle)
cv2.imshow('thresh', thresh)
cv2.imwrite('thresh.jpg', thresh)
cv2.waitKey()
My outpout:
2
> PLAN NACIONAL DE VACUNACION
CONTRA COVID-19 Hooia>—
| DOSIS APLICADAS: 191.480
CORTE 4:00 P.M. MARZO - 03 - 2021
SE AMDRES ay
“a 7 women {A GUNIRA (1345)
twas jf Ae d &
paunaanma Goer anda 760 — 2) ©? maaan
CARTABENA (3.457) BOLI (2500) — OESAN 018)
SURES) a ‘3 NORTE DE SANTANDER (25131
CORDOBA (2968) wo 6 SAATANDER (5.936)
ANTIONUIA (24.8563 TS - BOVAGA (5883),
CUNDALAMARCA (12.0251 TY ‘ ARAUCA C8)
SALDAS (23151 EN, f PS yionann 2
RAMLDA C320 wey Pee casa 1227
oan oe ts
Touma 5.080) BK eae 8 WETA (2444)
ALLE DEL cqUGA (20,160) ~=% se GUAINLA 592)
“m8 Se aN +S UAE (3023
‘ante ca o eco yqurtsig2an
puna 2635) ae SO BARUETAISES!
a
rum fe ‘AMAZDAAS (10548)
Oa
Fuente: Ministerio de Salud y Protecciin Saciat - Datos procesados 03 de marzo - 2021
The number of applied dosis don't appreciate well in the image, because is so blurry, any technique can be applied here?
Solution
Tesseract can use the gradients around text as part of its detection, so I'd suggest you avoid thresholding where possible, as it removes the gradients (anti-aliasing, as mentioned by fmw42) from the image.
Instead here I'd suggest inverting the image after you grayscale it, and then if necessary you can reduce the brightness to make the more grey text a bit blacker, and increase the contrast to make the grey background a bit more white. If you do need to adjust the brightness and/or contrast I'd suggest using cv2.convertScaleAbs to do so efficiently and avoid integer overflow problems.
Answered By - ES-Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.