Issue
I am trying to improve accuracy of passport MRZ reading with tesseract ocr and passportEye I have found few github repositories containing "*.traineddata", it says to move it into tesseract ocr tessdata folder, I did that. No where in readme of these repos says how to use it, I believe it is something trivial, but I am very new to this tesseract thing.
How do I use it with passportEye in python, I am completely lost here. searched a lot. Here is the current code.
import os
from passporteye import read_mrz
pr_path = os.getcwd()
file_path = os.path.join(pr_path,'my_app', 'data')
mrz = read_mrz(file_path + '/test1.jpg')
print(mrz)
This is the .traineddata file I want to test for more accuracy : https://github.com/DoubangoTelecom/tesseractMRZ/blob/master/tessdata_best/mrz.traineddata
I do not want to use bulky openCV. Please help
Solution
From looking into the source code I would say you can`t, without changing the codebase of PassportEye:
Normally you would pass the language you are using via: -l
paramerter to tesseract - in your case:
-l mrz
But the PassportEye implementation does not give you that option:
they pass lang=None, you would need to change that part to lang=mrz
pytesseract.run_tesseract(input_file_name,
output_file_name_base,
'txt',
lang='mrz',
config=config)
Answered By - Tanja Bayer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.