Issue
I am using pytesseract to recognize text as follow
td = pytesseract.image_to_data(img, output_type=Output.DICT)
tn_boxes = len(td['level'])
for o in range(0, tn_boxes):
text = td['text'][o]
print(text)
i am just making an index of Examples
by using a simple logic detect keyword 'Example no.' find it's end point keyword 'Sol.' and put a piece of image from keyword 'Example no.' to keyword 'Sol.' into index and then find next example and so on
But when i try following image
Then it show output
SET THEORY ae . . 5 (6) Let A = {x: x isa negative odd integer} = {-1,-3,-5,-7,
...etc
See how it is not recognizing first line Sol. (a) Let A={x:x is a natural number
..etc.
And when i try it with following image not having horizontal line
it just works fine.
Is there any way to configure pytesseract to recognize text with having a line above it ?
Edited:
sometimes when we place some image above text or some other text with higher size then pytesseract fails to detect text below that bigger object.
Is there any solution for this kind of problem may be there is a way to configure detection minimum size or configure to detect all possible sized text even under some bigger objects ?
For example
it show output usually denoted by o(G). ors a a {= 7 Wave =e () oe that the set of ae | group usual ition of integers.
See how it is not detecting keyword Example 1.
for folowing image
But when i try following image
it shows output usually denoted by o(G). Example 1. (2) Prove that th . group under usual addition of integers,
Now it is detecting keyword Example 1.
Solution
Read e.g. image processing to improve tesseract OCR accuracy and read the docs.
Answered By - user898678
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.