Issue
I have a bunch of image each one corresponding to a name that I'm passing to Pytesseract for recognition. Some of the names are a bit long and needed to be written in multiple lines so passing them for recognition and saving them to a .txt file resulted in each part being written in a newline.
Here's an example
This is being recognized as
MARTHE
MVUMBI
While I need them to be one the same line.
Another Example
It should be MOHAMED ASSAD YVES but it's actually being stored as:
MOHAMED
ASSAD YVES
I thought I was filtering through this sort of thing but apparently it's not working. Here's the code for recognition, storing and filtering that I'm using.
# Adding custom options
folder = rf"C:\Users\lenovo\PycharmProjects\SoftOCR_PFE\name_results"
custom_config = r'--oem 3 --psm 6'
words = []
filenames = os.listdir(folder)
filenames.sort()
for directory in filenames:
print(directory)
for img in glob.glob(rf"name_results\{directory}\*.png"):
text = pytesseract.image_to_string(img, config=custom_config)
words.append(text)
words.append("\n")
all_caps = list([s.strip() for s in words if s == s.upper() and s != 'NOM' and s != 'PRENOM'])
no_blank = list([string for string in all_caps if string != ""])
with open('temp.txt', 'w+') as filehandle:
for listitem in no_blank:
filehandle.write(f'{listitem}\n')
uncleanText = open("temp.txt").read()
cleanText = re.sub('[^A-Za-z0-9\s\d]+', '', uncleanText)
open('saved_names.txt', 'w').write(cleanText)
I had to post again since my last question was posted really late at night and didn't get any action.
Solution
I would try to add after the line:
text = pytesseract.image_to_string(img, config=custom_config)
This line:
text = text.replace("\n", " ")
Update
There was another problem. How to join every second line with ,
in the file and save them back in the file. It can be done this way:
with open("temp.txt", "r") as f:
names = f.readlines()
names = [n.replace("\n", "") for n in names]
names = [", ".join(names[i:i+2]) for i in range(0, len(names), 2)]
with open("temp.txt", "w") as f:
f.write("\n".join(names))
Answered By - Yuri Khristich
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.