Issue
Here is my Dockerfile:
FROM python:3.11-buster AS app
WORKDIR /srv/app
ENV PYTHONPATH /srv/app
RUN apt update -y && \
apt install -y libegl1 libgl1 libxkbcommon-x11-0 dbus tesseract-ocr liblept5 leptonica-progs libleptonica-dev libtesseract-dev
RUN pip install --upgrade pip
When I connect to my container and I launch: pip install tesserocr
I got error:
$ pip install tesserocr
Collecting tesserocr
Downloading tesserocr-2.6.0.tar.gz (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.6/58.6 kB 929.0 kB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Building wheels for collected packages: tesserocr
Building wheel for tesserocr (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [32 lines of output]
Supporting tesseract v4.0.0
Tesseract major version 4
Configs from pkg-config: {'library_dirs': [], 'include_dirs': ['/usr/include', '/usr/include'], 'libraries': ['tesseract', 'lept'], 'compile_time_env': {'TESSERACT_MAJOR_VERSION': 4, 'TESSERACT_VERSION': 67108864}}
/usr/local/lib/python3.11/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
warnings.warn(
running bdist_wheel
running build
running build_ext
Detected compiler: unix
building 'tesserocr' extension
creating build
creating build/temp.linux-x86_64-cpython-311
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/include -I/usr/include -I/usr/local/include/python3.11 -c tesserocr.cpp -o build/temp.linux-x86_64-cpython-311/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
tesserocr.cpp: In function ‘tesseract::TessResultRenderer* __pyx_f_9tesserocr_13PyTessBaseAPI__get_renderer(__pyx_obj_9tesserocr_PyTessBaseAPI*, __pyx_t_9tesseract_cchar_t*)’:
tesserocr.cpp:22546:14: error: ‘TessAltoRenderer’ is not a member of ‘tesseract’
tesseract::TessAltoRenderer *__pyx_t_6;
^~~~~~~~~~~~~~~~
tesserocr.cpp:22546:14: note: suggested alternative: ‘TessOsdRenderer’
tesseract::TessAltoRenderer *__pyx_t_6;
^~~~~~~~~~~~~~~~
TessOsdRenderer
tesserocr.cpp:22546:32: error: ‘__pyx_t_6’ was not declared in this scope
tesseract::TessAltoRenderer *__pyx_t_6;
^~~~~~~~~
tesserocr.cpp:22546:32: note: suggested alternative: ‘__pyx_t_5’
tesseract::TessAltoRenderer *__pyx_t_6;
^~~~~~~~~
__pyx_t_5
tesserocr.cpp:22645:23: error: expected type-specifier
__pyx_t_6 = new tesseract::TessAltoRenderer(__pyx_v_outputbase);
^~~~~~~~~
error: command '/usr/bin/gcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tesserocr
Running setup.py clean for tesserocr
Failed to build tesserocr
ERROR: Could not build wheels for tesserocr, which is required to install pyproject.toml-based projects
How to solve this ?
Solution
I was suffering from the same issue you were, hopefully this is helpful still:
I believe the dependencies for tesserocr
are not correctly met with the default packages in the repositories for buster
.
Case in point, the apt repository information for Buster suggests tesseract-ocr
version 4.0.0-2
, and version 5.3.0-2
for the bookworm
version. When I run the minimum failing example below in Docker, I receive error output suggesting some kind of incompatibility between tesserocr
and tesseract
: tesserocr.cpp:22550:14: error: ‘TessAltoRenderer’ is not a member of ‘tesseract’
.
However, the working dockerfile requires only an upgrade to bullseye
from buster
. I believe this has something to do with dependencies, and without a ton of research I'm not sure I'd be able to figure out what specifically has changed, but my suggestion is to upgrade from buster
to at least bullseye
. I tested it on bookworm
as well, and it works (which also lets you use tesseract version 5+).
Here is a Dockerfile which fails to build:
FROM python:3.11.4-buster
WORKDIR /app
RUN apt-get update && apt-get install -y \
libleptonica-dev \
tesseract-ocr \
libtesseract-dev
RUN pip install tesserocr
RUN mkdir src && \
echo "from tesserocr import PyTessBaseAPI" >> ./src/main.py && \
echo "with PyTessBaseAPI() as api:" >> ./src/main.py && \
echo " print(api.Version())" >> ./src/main.py
ENTRYPOINT ["python", "./src/main.py"]
Here is a Dockerfile which succeeds to build, and run:
FROM python:3.11.4-bullseye # Also tested with -bookworm
WORKDIR /app
RUN apt-get update && apt-get install -y \
libleptonica-dev \
tesseract-ocr \
libtesseract-dev
RUN pip install tesserocr
RUN mkdir src && \
echo "from tesserocr import PyTessBaseAPI" >> ./src/main.py && \
echo "with PyTessBaseAPI() as api:" >> ./src/main.py && \
echo " print(api.Version())" >> ./src/main.py
ENTRYPOINT ["python", "./src/main.py"]
Answered By - Noah Hood
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.