Issue
I've been using Tesseract 4, for a project for more than two months now. (This means that it's running on input images for more than two months.) The problem that I'm shown is:
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/cse/.local/lib/python3.5/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
func = lambda args: f(*args)
File "UKExtraction2.py", line 267, in tessBox
op = pt.image_to_string(box[0],lang='hin+eng',config='--psm 6')
File "/home/cse/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 286, in image_to_string
return run_and_get_output(image, 'txt', lang, config, nice)
File "/home/cse/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 194, in run_and_get_output
run_tesseract(**kwargs)
File "/home/cse/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 170, in run_tesseract
raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: tesseract: undefined symbol: _ZN9tesseract15TessPDFRendererC1EPKcS2_b')
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "UKExtraction2.py", line 855, in <module>
doItAllUpper("A0","UK4.csv","temp",27,70,"box",2,1000,firstPageCoordsUK,boxCoordUK,voterBoxCoordUK,internalBoxNumberCoordUK,externalBoxNumberCoordUK,addListInfoUK)
File "UKExtraction2.py", line 776, in doItAllUpper
doItAll(tempPDFName,outputCSV,2,pdfs,formatType,n_blocks,writeBlockSize,firstPageCoords,boxCoord,voterBoxCoord,internalBoxNumberCoord,externalBoxNumberCoord,addListInfo,pdfName)
File "UKExtraction2.py", line 617, in doItAll
mainProcess(pdfName,(0,noOfPages-1),formatType,n_blocks,outputCSV,writeBlockSize,firstPageCoords,boxCoord,voterBoxCoord,internalBoxNumberCoord,externalBoxNumberCoord,addListInfo,bigPDFName,basePages)
File "UKExtraction2.py", line 563, in mainProcess
names_lst = cropAndOCR(im,(tup[0],tup[1]),formatType,boxCoord,voterBoxCoord,externalBoxNumberCoord,n_blocks,basePages)# Add the values of fpageInfo
File "UKExtraction2.py", line 416, in cropAndOCR
results = pool.map(tessBox,box_lst_divided)
File "/home/cse/.local/lib/python3.5/site-packages/pathos/multiprocessing.py", line 137, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/cse/.local/lib/python3.5/site-packages/multiprocess/pool.py", line 644, in get
raise self._value
pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: tesseract: undefined symbol: _ZN9tesseract15TessPDFRendererC1EPKcS2_b')
The pathos
part is because of the fact that the project uses two threads to work. The important part is:
pytesseract.pytesseract.TesseractError: (127, 'tesseract: symbol lookup error: tesseract: undefined symbol: _ZN9tesseract15TessPDFRendererC1EPKcS2_b')
A user posted for this error on the tesseract-ocr google mailing group:
combine_tessdata: symbol lookup error: combine_tessdata: undefined symbol: _Z7tprintfPKcz
And got the answer that
"undefined symbol" indicate a broken installation
But as I said, this version is running without any errors for more than two months, so there shouldn't be any problems with the tesseract installation.
Another user posted the same problem at the group, but no one replied.
So, I assumed that the problem can be at two places:
- In the image provided to tesseract.
- Inside tesseract.
The image might not be an image altogether! That is, it might have 0x0 dimensions (though that isn't possible given the construction process of the image). But that is not possible, because the error I got was:
SystemError: tile cannot extend outside image
When I tried my hypothesis.
This means, that the image was present, so tesseract should have worked.
This also means that the problem is inside Tesseract. I'm no expert at tesseract's inner workings, but given the fact that this version worked until now correctly and there is no problem with the input image, what could be the problem with Tesseract?
P.S: I'm currently not near the system that runs the script, but I do know of the error that occurred. I might not be able to give exact details about the system, therefore I expect hypothesis for the problem.
P.S: The script is here.
Solution
Here is the solution for ubuntu 18.04
Please first install the libraries which are required for tesseract-ocr
sudo apt install libtesseract-dev libleptonica-dev liblept5
Then simply install tesseract using command
sudo apt install tesseract-ocr -y
Answered By - Hassan ALi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.