Issue
I have an image which contains the text in circular form. In this image, there are two cicles. I want to remove the inner circle text from the image, and extract the outer circle text. How to remove the inner circle text, and after removing the inner text, how to extract the outer circle text? What are the steps to solve this problem?
Input image:
Solution
Your image was a nice toy example to play around with cv2.warpPolar
, so I made some code, that I will share here, too. So, that'd be my approach:
Grayscale and binarize the input image, mainly to get rid of JPG artifacts.
Crop the center part of the image to get rid of the large areas left and right, since we'll find contours later, so that becomes less difficult.
Find (nested) contours, cf.
cv2.RETR_TREE
. Please, see this answer for an extensive explanation on contour hierarchies.Filter and sort the found contours by area, such that only the four circle related contours (inner and outer edges for two circles) are kept.
Remove the inner text by simply painting over using the contours from the inner circle.
If explicitly needed, do that for the original image also.
Rotate the image before remapping, cf. the explanations in the linked
cv2.warpPolar
documentation. Remap image to polar coordinates, and rotate the result for proper OCR.Run
pytesseract
whitelisting only capital letters.
That's the full code with the proper output:
import cv2
import pytesseract
# Read image
img = cv2.imread('fcJAc.jpg')
# Convert to grayscale, and binarize, especially for removing JPG artifacts
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY_INV)[1]
# Crop center part of image to simplify following contour detection
h, w = gray.shape
l = (w - h) // 2
gray = gray[:, l:l+h]
# Find (nested) contours (cf. cv2.RETR_TREE) w.r.t. the OpenCV version
cnts = cv2.findContours(gray, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
# Filter and sort contours on area
cnts = [cnt for cnt in cnts if cv2.contourArea(cnt) > 10000]
cnts = sorted(cnts, key=cv2.contourArea)
# Remove inner text by painting over using found contours
# Contour index 1 = outer edge of inner circle
gray = cv2.drawContours(gray, cnts, 1, 0, cv2.FILLED)
# If specifically needed, also remove text in the original image
# Contour index 0 = inner edge of inner circle (to keep inner circle itself)
img[:, l:l+h] = cv2.drawContours(img[:, l:l+h], cnts, 0, (255, 255, 255),
cv2.FILLED)
# Rotate image before remapping to polar coordinate space to maintain
# circular text en bloc after remapping
gray = cv2.rotate(gray, cv2.ROTATE_90_COUNTERCLOCKWISE)
# Actual remapping to polar coordinate space
gray = cv2.warpPolar(gray, (-1, -1), (h // 2, h // 2), h // 2,
cv2.INTER_CUBIC + cv2.WARP_POLAR_LINEAR)
# Rotate result for OCR
gray = cv2.rotate(gray, cv2.ROTATE_90_COUNTERCLOCKWISE)
# Actual OCR, limiting to capital letters only
config = '--psm 6 -c tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZ "'
text = pytesseract.image_to_string(gray, config=config)
print(text.replace('\n', '').replace('\f', ''))
# CIRCULAR TEXT PHOTOSHOP TUTORIAL
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.1
OpenCV: 4.5.2
pytesseract: 5.0.0-alpha.20201127
----------------------------------------
Answered By - HansHirse
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.