Issue
I've been trying to read pdf pages as an image, for extraction purposes.
I found that layoutparser serves this purpose by identifying blocks of text. However, when I try to Create a Detectron2-based Layout Detection Model
, I encounter the following error:
codeblock:
model = lp.Detectron2LayoutModel(
config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
label_map = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8]
)
error:
OSError Traceback (most recent call last)
<ipython-input-16-893fdc4d537c> in <module>
2 config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
3 label_map = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
----> 4 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8]
5 )
6
.
.
.
d:\softwares\python3\lib\site-packages\portalocker\utils.py in _get_fh(self)
269 def _get_fh(self) -> typing.IO:
270 '''Get a new filehandle'''
--> 271 return open(self.filename, self.mode, **self.file_open_kwargs)
272
273 def _get_lock(self, fh: typing.IO) -> typing.IO:
OSError: [Errno 22] Invalid argument: 'C:\\Users\\user/.torch/iopath_cache\\s/nau5ut6zgthunil\\config.yaml?dl=1.lock'
I checked the destination path folder, and surprisingly, there is no config.yaml
file, which can be the reason why the error shows up. I tried uninstalling and re-installing PyTorch in anticipation that the .yaml files would be installed correctly. Unfortunately, the problem remains the same.
I would appreciate a solution for this, or an alternative suggestion if exists.
Solution
I found the solution as adding the congif path of tesseract.exe to pytesseract_cmd for running CLI behind on jupyter:
pytesseract.pytesseract.tesseract_cmd = r'path\to\folder\Tesseract_OCR\tesseract.exe'
Then calling the Detectron2Model didn't throw error.
Referred to this thread Pytesseract : “TesseractNotFound Error: tesseract is not installed or it's not in your path”, how do I fix this?
Answered By - iGetRandomBugs
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.