I've been learning Text recognition in python recently. When converting images to string it outputs an extra newline randomly in my images. I've tried removing it but cant seem to find a way how. My goal is to separate the choices into its corresponding strings
Here is my code and image :
choices = cv2.imread("ROI_0.png", 0)
custom_config = r'--oem 3 --psm 6'
c = pytesseract.image_to_string(choices, config=custom_config, lang='eng')
print(c.rstrip("\n")) # my attempt
text = repr(c)
newtext = text.split("\\n")
Here is the outputs:
a. E. 0. 125
b. R. A. 3846
c. R. A. 3396
d. R. A. 7925
'a. E. 0. 125\n\nb. R. A. 3846\nc. R. A. 3396\nd. R. A. 7925'
["'a. E. 0. 125", '', 'b. R. A. 3846', 'c. R. A. 3396', "d. R. A. 7925'"]
What you can do is remove multiple new lines to a single new line:
import re
x = re.sub(r'\n{2, 10}', '\n', c) # \n is new line, {2,10} is the range of occurrences of the newline that I'm searching for.
So it would be like:
choices = cv2.imread("ROI_0.png", 0)
custom_config = r'--oem 3 --psm 6'
c = pytesseract.image_to_string(choices, config=custom_config, lang='eng')
x = re.sub(r'\n{2, 10}', '\n', c)
Answered By - Lakshitha Wisumperuma
Post a Comment
Note: Only a member of this blog may post a comment.