Saturday, September 3, 2022

[FIXED] How to crop images using Pillow and pytesseract?

September 03, 2022 python-3.x, python-imaging-library, python-tesseract No comments

Issue

I was trying to use pytesseract to find the box positions of each letter in an image. I tried to use an image, and cropping it with Pillow and it worked, but when I tried with a lower character size image (example), the program may recognize the characters, but cropping the image with the box coordinates give me images like this. I also tried to double up the size of the original image, but it changed nothing.

img = Image.open('imgtest.png')
data=pytesseract.image_to_boxes(img)
dati= data.splitlines()
corde=[]
for i in dati[0].split()[1:5]: #just trying with the first character
    corde.append(int(i))
im=img.crop(tuple(corde))
im.save('cimg.png')

Solution

If we stick to the source code of image_to_boxes, we see, that the returned coordinates are in the following order:

left bottom right top

From the documentation on Image.crop, we see, that the expected order of coordinates is:

left upper right lower

Now, it also seems, that pytesseract iterates images from bottom to top. Therefore, we also need to further convert the top/upper and bottom/lower coordinates.

That'd be the reworked code:

from PIL import Image
import pytesseract

img = Image.open('MJwQi9f.png')
data = pytesseract.image_to_boxes(img)
dati = data.splitlines()
corde = []
for i in dati[0].split()[1:5]:
    corde.append(int(i))
corde = tuple([corde[0], img.size[1]-corde[3], corde[2], img.size[1]-corde[1]])
im = img.crop(tuple(corde))
im.save('cimg.png')

You see, left and right are in the same place, but top/upper and bottom/lower switched places, and where also altered w.r.t. the image height.

And, that's the updated output:

The result isn't optimal, but I assume, that's due to the font.

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.16299-SP0
Python:        3.9.1
Pillow:        8.1.0
pytesseract:   4.00.00alpha
----------------------------------------

Answered By - HansHirse

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, September 3, 2022

[FIXED] How to crop images using Pillow and pytesseract?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels