Issue
I've been learning Text recognition in python recently. When converting images to string it outputs an extra newline randomly in my images. I've tried removing it but cant seem to find a way how. My goal is to separate the choices into its corresponding strings
Here is my code and image :
choices = cv2.imread("ROI_0.png", 0)
custom_config = r'--oem 3 --psm 6'
c = pytesseract.image_to_string(choices, config=custom_config, lang='eng')
print(c.rstrip("\n")) # my attempt
text = repr(c)
print(text)
newtext = text.split("\\n")
print(newtext)
Here is the outputs:
a. E. 0. 125
b. R. A. 3846
c. R. A. 3396
d. R. A. 7925
'a. E. 0. 125\n\nb. R. A. 3846\nc. R. A. 3396\nd. R. A. 7925'
["'a. E. 0. 125", '', 'b. R. A. 3846', 'c. R. A. 3396', "d. R. A. 7925'"]
Solution
What you can do is remove multiple new lines to a single new line:
import re
x = re.sub(r'\n{2, 10}', '\n', c) # \n is new line, {2,10} is the range of occurrences of the newline that I'm searching for.
So it would be like:
choices = cv2.imread("ROI_0.png", 0)
custom_config = r'--oem 3 --psm 6'
c = pytesseract.image_to_string(choices, config=custom_config, lang='eng')
x = re.sub(r'\n{2, 10}', '\n', c)
print(x.rstrip("\n"))
Answered By - Lakshitha Wisumperuma
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.