Issue
So I extracted string from an image with 3 columns. the extracted text is: SUBJECT GRADE FINALGRADE CREDITS ADVANCED CALCULUS 1 1.54 A 3 I want to put a separator between these items that it should look like this: SUBJECT, GRADE, FINALGRADE, CREDITS ADVANCED CALCULUS 1, 1.54, A, 3
Solution
We can achieve the solution by two-steps.
-
- Specify the starting keyword.
-
- Split the line using space as the separator.
If we look at the provided example from the comment:
We don't need any image-preprocessing, since there is no artifact in the image.
Assume I want to separate the row starting with "state" with comma.
-
Specify the starting keyword:
-
start_word = line.split(" ")[0]
-
-
Split the line using space as the separator:
-
if start_word == "state": line = line.split(" ")
-
Now for each word in the line, we can add comma to the end
for word in line:
result += word + ", "
But we need to remove the last two characters, otherwise it will end "2000, "
result = result[:-2]
print(result)
Result:
state, 1983, 1987, 1988, 1993, 1994, 1999, 2000
Code:
import cv2
import pytesseract
img = cv2.imread("15f8U.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255,
cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY, 11, 2)
txt = pytesseract.image_to_string(gry)
txt = txt.split("\n")
result = ""
for line in txt:
start_word = line.split(" ")[0]
if start_word == "state":
line = line.split(" ")
for word in line:
result += word + ", "
result = result[:-2]
print(result)
continue
if line != '' or line != "":
print(line)
Result:
Table 1: WAGE SAMPLE STATISTICS, by year and state (1983-2000)
Logged mean wages
in year
state, 1983, 1987, 1988, 1993, 1994, 1999, 2000
Andhra Pradesh 5.17 5.49 5.53 6.28 6.24 5.77 5.80
Gujarat 9 6.04 5.92 6.64 6.58 6.09 6.04
Haryana 12 6.25 6.43 6.80 6.60 6.54 6.74
Manipur 54 6.31 6.73 7.15 7.09 6.90 6.83
Orissa 5.24 5.90 5.96 6.16 6.26 5.57 5.58
Tamil Nadu 5.19 5.67 5.68 6.31 633 6.02 5.97
Uttar Pradesh 5.55 6.06 3 6.61 2 6.00 6.07
Mizoram 6.43 5.44 6.03 681 6.76 8 7
Answered By - Ahx
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.