Sunday, January 23, 2022

[FIXED] While extracting a text from image using pytesseract , numbers are printing first and then the strings are printed

January 23, 2022 python-tesseract No comments

Issue

While extracting a text from image using pytesseract, numbers are printing first and then the strings are printed. Why is this happening?

This is my input image.

import cv2
import pytesseract
from pytesseract import Output
from PIL import Image
imginput = cv2.imread('ss.png')
x,img1 = cv2.threshold(imginput, 180, 255, cv2.THRESH_BINARY)
img = Image.fromarray(img1)
d = pytesseract.image_to_string(img, output_type=Output.DICT)
print(d)

My output:

'text': **'71.\n\n72.\n\n73.\n\n74.\n\n75.\n\n76.\n\n77.\n\n78.\n\n79.\n\n80.**n\nPick out the synonym of the word ‘depositary’ :\n\n(A) inheritor (B) ward (C) patron (D) trustee\nThe fifth chapter comprises three sections.\n(A) of (B) with (C) no preposition (D) on\n\nAntonym of ‘abortive’ is :\n(A) _ successful (B) reproductive (C) instantaneous (D) fruitful\n\nThe one word for a person who doubts in religious practices :\n(A) _ stoic (B) sceptic (C) theist (D) pantheist\n\nThe idiom “bury the hatchet’ means .\n(A) keep enmity (B) open enmity (C) stop enmity (D) have no enmity\n\nVictor seldom visits his uncle, Add proper tag question.\n(A) doesn’t he ? (B) isn’the? (C) ishe? (D) does he ?\n\n‘Khalil Gibran is one of the greatest poets of the world.’ Pick out the comparative degree of\nthe sentence.\n\n(A) Khalil Gibran is greater than many other poets of the world.\n(B) Khalil Gibran is greater than any other poet of the world.\n(C) Khalil Gibran is greater than any other poets of the world.\n(D) Khalil Gibran is the greatest poet of the world.\n\nThe passive form of ‘I keep my books here.’ is :\n(A) My books keep here (B) My books are keeping here\n(C) Iam kept the books here (D) My books are kept here\n\nPick out the correctly spelt word.\n\n(A) Constellation (B) Consistancy\n(C) Conspirecy (D) Conservatary\nWe need two more players to the team. Supply suitable phrasal verb.\n(A) make out (B) make up (C) make for (D) make of\n11 052/2019 - M\n\n{P.T.0}'}

Solution

Try running with other segmentation modes:

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

Add it like so:

# Example of adding any additional options.
custom_oem_psm_config = r'--psm 6'
pytesseract.image_to_string(image, config=custom_oem_psm_config, output_type=Output.DICT)

Answered By - K41F4r

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 23, 2022

[FIXED] While extracting a text from image using pytesseract , numbers are printing first and then the strings are printed

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels