Issue
My Requirement
I have generated an exe from python code using pyinstaller. Since I am using pytesseract.image_to_pdf_or_hocr OCR operation Tesseract is the dependency. So I have added tesseract folders in spec file and generated the exe. My aim is to bundle tesseract folders inside the exe itself, so that tesseract folders will be copied in Temp files when I open the exe.
What have I done so far
I have successfully generated the exe from python code. When I open the exe the bundled tesseract files will be extracted to temp files directory and I can start using the tesseract.
Issue
Generally when we run pytesseract.image_to_pdf_or_hocr it will create a pdf (i am passing extension='pdf' parameter) in temp folder with some random name tess_rwsvvy4k.pdf. This works when I run code directly. But when I am running exe, it is not generating the pdf file in temp folder.
Error from Log file
File "PDF to Readable.py", line 216, in start_conversion
File "pytesseract\pytesseract.py", line 446, in image_to_pdf_or_hocr
File "pytesseract\pytesseract.py", line 290, in run_and_get_output
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\PAVANS~1\\AppData\\Local\\Temp\\tess_rwsvvy4k.pdf'
Spec file
....
a = Analysis(
['PDF to Readable.py'],
pathex=[],
datas=[],
binaries=[('Source/poppler-0.68.0/bin/*', 'poppler-0.68.0/bin'),
('Source/poppler-0.68.0/include/poppler/cpp/*', 'poppler-0.68.0/include/poppler/cpp'),
('Source/poppler-0.68.0/lib/pkgconfig/*', 'poppler-0.68.0/lib/pkgconfig'),
('Source/poppler-0.68.0/lib/*.*', 'poppler-0.68.0/lib'),
('Source/poppler-0.68.0/share/man/man1/*', 'poppler-0.68.0/share/man/man1'),
('Source/TESSEERACT-OCR/*.*', 'TESSEERACT-OCR'),
('Source/TESSEERACT-OCR/doc/*', 'TESSEERACT-OCR/doc'),
('Source/TESSEERACT-OCR/tessdata/*.*', 'TESSEERACT-OCR/tessdata'),
('Source/TESSEERACT-OCR/tessdata/configs/*.*', 'TESSEERACT-OCR/tessdata/configs'),
('Source/TESSEERACT-OCR/tessdata/tessconfigs/*.*', 'TESSEERACT-OCR/tessdata/tessconfigs')],
.....
Guide me in the right direction...
Solution
Actually, I made a small mistake in spec file. Original tessdata\configs
folder contains 25 files and tessdata\tessconfigs
folder contains 6 files. But due to my mistake I got only 5 files in tessdata\configs
folder. So I missed dependencies for hocr
. So in my case I could not able to generate pdf
.
My Mistake
('Source/TESSEERACT-OCR/tessdata/configs/*.*', 'TESSEERACT-OCR/tessdata/configs'),
('Source/TESSEERACT-OCR/tessdata/tessconfigs/*.*', 'TESSEERACT-OCR/tessdata/tessconfigs')
What I have changed
('Source/TESSEERACT-OCR/tessdata/configs/*', 'TESSEERACT-OCR/tessdata/configs'),
('Source/TESSEERACT-OCR/tessdata/tessconfigs/*', 'TESSEERACT-OCR/tessdata/tessconfigs')
So this will add all the files from the tessdata\configs
and tessdata\tessconfigs
folders. And I have got my desired output.
Answered By - Pavan Sai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.