Issue
I am trying to extract text from an image using python tesseract. I have tried multiple fail extractions. What is the reason that tesseract is unable to extract text? Here is the image []
Code
import cv2
import pytesseract as pt
inp = "./image.jpg"
img = cv2.imread(inp)
print(pt.image_to_string(img))
Version
tesseract 4.0.0-beta.1
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found SSE
Solution
You could do some preprocessing with opencv to fix the problem
try:
from PIL import Image
except ImportError:
import Image
import pytesseract # pip install pytesseract
import cv2 # pip install opencv-python
# Opens the image with opencv
image = cv2.imread("test.jpg",0) #change to your file
# Preprocesses the image
thresh = cv2.threshold(image,0,255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Only prints allowed chars which is 0123456789:
print(pytesseract.image_to_string(thresh, lang='eng', \
config='--psm 6 -c tessedit_char_whitelist=0123456789:'))
Output:
05:26:34
09:04:24
01:00:31
01:14:36
01:17:43
02:31:05
02:35:41
05:32:42
03:26:09
02:44:11
02:56:00
02:32:42
02:35:16
07:16:10
07:18:36
07:19:00
07:19:32
07:21:17
07:21:48
Keep in mind you also need tesseract installed and added to the path
If you get a lot of random stuff or it didn't find the language "eng" there is a easy fix:
If you are on linux cd into /usr/local/share/tessdata or /usr/share/tessdata and run
sudo wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
That will download the english language file and hopefully fix the problem
Tessreact version:
>> tesseract --version
tesseract 4.1.1
leptonica-1.81.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.0) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.4.5
Answered By - Robiot
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.