Sunday, November 21, 2021

[FIXED] Extracting Semi-structured Text from a complex UI (Golf Simulator)

November 21, 2021 image-processing, ocr, opencv, python-tesseract, tesseract No comments

Issue

I'm fairly new to the world of OCR, OpenCV, Tesseract etc and was hoping to get some advice or a nudge in the right direction for a project I'm working on. For context, I practice golf at an indoor simulator that is powered by Full Swing Golf. My goal is to build an app (preferably iphone, but desktop is fine too) that will be able to grab the data provided by the simulator and process it however I'd like. The overall workflow would look something like:

Set up iPhone or laptop camera to watch the simulator screen.
Hit ball
Statistics Screen is displayed that looks more or less like:

Detect that the Statistics Screen has been displayed and grab all relevant data:

| Distance | Launch | Back Spin | Club Speed | Carry | To Pin | Direction | Ball Speed | Side Spin | Club Face | Club Path |
|----------|--------|-----------|------------|-------|--------|-----------|------------|-----------|-----------|-----------|
| 345      | 13     | 3350      | 135        | 335   | 80     | 2.4       | 190        | 350       | 4.3       | 1.6       |

5-?: Save the data to my app, keep track of it over time etc...

Attempts So Far:

It seemed like OpenCV's matchTemplate would be a simple way to find all of the headings in the image (Distance, Launch etc...) and it does seem to work when the image and template are both the perfect resolution. However, as this will be an iPhone app, the quality is not something I can really guarantee (within reason). Moreso, the screen will almost never be straight-on as it appears above. Most likely, the camera will be off to the side and we will have to de-skew accordingly. I've attempted to use the following image to work on my deskewing logic to no avail:

Finding the reference points in order to deskew via getPerspectiveTransform and warpPerspective has proven to be incredibly difficult due to the above issues with matching templates.

I've also tried dynamically adjusting for scale with code resembling the following:

def findTemplateLocation(image_path):
    template = cv2.imread(image_path)
    template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)

    w, h = template.shape[::-1]
    threshold = 0.65
    loc = []

    for scale in np.linspace(0.1, 2, 20)[::-1]:
        resized = imutils.resize(template, width=int(template.shape[1] * scale))
        w, h = resized.shape[::-1]
        res = cv2.matchTemplate(image_gray, resized, cv2.TM_CCOEFF_NORMED)

        loc = np.where(res >= threshold)
        if len(list(zip(*loc[::-1]))) > 0:
            break

    if loc and len(list(zip(*loc[::-1]))) > 0:
        adjusted_w = int(w/scale)
        adjusted_h = int(h/scale)
        print(str(adjusted_w) + " " + str(adjusted_h) + " " + str(scale))

        ret = []
        for pt in zip(*loc[::-1]):
            ret.append({'width': w, 'height': h, 'location': pt})

        return ret

    return None

This still returns a ton of false positives.

I'm hoping to get some advice on how to approach this problem with a clean slate. I'm open to any language / workflow.

If it does seem that I'm on the right track, my current code is at https://gist.github.com/naderhen/9ec8d45f13d92507131d5bce0e84fad8 . Would really appreciate any suggestions for the best next steps.

Thanks for any help you can provide!

EDIT: Additional Resources

I've uploaded a number of videos and still photos from my time at the indoor simulator this weekend: https://www.dropbox.com/sh/5vub2mi4rvunyaw/AAAY1_7Q_WBV4JvmDD0dEiTDa?dl=0

I tried to get a number of different angles, with different lighting etc. Please let me know if I can provide any other resources that may help.

Solution

So, I tried two different methods:

Contour detection - This seemed to be the most obvious method since the statistics screen is the primary part of the image and is present in all your images. Although it does work with two of the three images, it might not be very robust with the parameters. Here are the steps that I tried for contour:

First, get the the image in grayscale or take one of the Value channel in HSV. Then, threshold the image using either Otsu or Adaptive Thresholding. After playing with a lot of the associated parameters, I got satisfactory results, which would basically mean nice whole statistics screen in white on a black background. After this, sort the contours like this:
```
contours = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)[1]
# Sort the contours to avoid unnecessary comparison in the for loop below
cntsSorted = sorted(contours, key=lambda x: cv2.contourArea(x), reverse=True)

for cnt in cntsSorted[0:20]:
    peri = cv2.arcLength(cnt, True)
    approx = cv2.approxPolyDP(cnt, 0.04 * peri, True)
    if len(approx) == 4 and peri > 10000:
        cv2.drawContours(sorted_image, cnt, -1, (0, 255, 0), 10)
```
Feature Detection and Matching: Since using contours wasn't robust enough, I tried another method which I've worked on for a similar problem as yours. This method is fairly robust, much faster (I tried this on an android phone 2 years back and it would do the job in less than a second for a 1280 x 760 image). However, after trying on your work cases, I figured that your images are pretty vague. What I mean by that is, you have two images in your question that have fairly similar primaries and it works for that but the images you posted in comments are very different from these and hence it doesn't find suitable number of good matches (at least 10 in my case). If you can post a nice set of images that you actually will encounter, I will update this answer with my results on the new set. More importantly, the images of the scene evidently have change in perspective which shouldn't be an issue assuming you are able to get a very good source image (as the first one in your question). However, the change in lighting conditions can be a pain. I'd suggest either using different color spaces such as HSV, Lab and Luv instead of BGR. Here is where you can find a working example of how to implement your own feature matcher. There are some code changes required depending on the version of OpenCV you are using but I am sure you can find the solutions ( I did ;) ).

A good example:

Some suggestions:

Try getting as clean an image as possible for the image you are using to match with the others (your first image in my case). Hopefully, this would require you to do less processing.
Try using unsharp mask before finding Keypoints.
My results are from using ORB. You can also try with other detectors/descriptors like SURF, SIFT and FAST.

Finally, your approach of template matching should work in cases where there is a change only in the scaling and not the perspective.

Hope this helps! Write a comment if you have any additional questions and/or when you have a good image set ready (rubs palms). Cheers!

Edit 1: This is the code that I used for the Feature Detection and Matching in Opencv 3.4.3 and Python 3.4

def unsharp_mask(im):
    # This is used to sharpen images
    gaussian_3 = cv2.GaussianBlur(im, (3, 3), 3.0)
    return cv2.addWeighted(im, 2.0, gaussian_3, -1.0, 0, im)

def screen_finder2(image, source, num=0):
    def resize(im, new_width):
        r = float(new_width) / im.shape[1]
        dim = (new_width, int(im.shape[0] * r))
        return cv2.resize(im, dim, interpolation=cv2.INTER_AREA)
    width = 300
    source = resize(source, new_width=width)
    image = resize(image, new_width=width)

    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2LUV)
    image, u, v = cv2.split(hsv)

    hsv = cv2.cvtColor(source, cv2.COLOR_BGR2LUV)
    source, u, v = cv2.split(hsv)

    MIN_MATCH_COUNT = 10
    orb = cv2.ORB_create()
    kp1, des1 = orb.detectAndCompute(image, None)
    kp2, des2 = orb.detectAndCompute(source, None)

    flann = cv2.DescriptorMatcher_create(cv2.DescriptorMatcher_FLANNBASED)
    # Without the below 2 lines, matching doesn't work
    des1 = np.asarray(des1, dtype=np.float32)
    des2 = np.asarray(des2, dtype=np.float32)

    matches = flann.knnMatch(des1, des2, k=2)

    # store all the good matches as per Lowe's ratio test
    good = []
    for m, n in matches:
        if m.distance < 0.7 * n.distance:
            good.append(m)

    if len(good) >= MIN_MATCH_COUNT:
        src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1, 
                                                                         1, 2)
        dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1, 
                                                                         1, 2)

        M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
        matchesMask = mask.ravel().tolist()

        h,w = image.shape
        pts = np.float32([[0, 0], [0, h-1], [w-1, h-1], [w-1, 0]]).reshape(-1, 
                                                                         1, 2)
        dst = cv2.perspectiveTransform(pts, M)
        source_bgr = cv2.cvtColor(source, cv2.COLOR_GRAY2BGR)
        img2 = cv2.polylines(source_bgr, [np.int32(dst)], True, (0,0,255), 3, 
                             cv2.LINE_AA)
        cv2.imwrite("out"+str(num)+".jpg", img2)
    else:
        print("Not enough matches." + str(len(good)))
        matchesMask = None

    draw_params = dict(matchColor=(0, 255, 0), # draw matches in green color
                       singlePointColor=None,
                       matchesMask=matchesMask, # draw only inliers
                       flags=2)
    img3 = cv2.drawMatches(image, kp1, source, kp2, good, None, **draw_params)
    cv2.imwrite("ORB"+str(num)+".jpg", img3)


match_image = unsharp_mask(cv2.imread("source.jpg"))
image_1 = unsharp_mask(cv2.imread("Screen_1.jpg"))
screen_finder2(match_image, image_1, num=1)

Answered By - Rick M.

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 21, 2021

[FIXED] Extracting Semi-structured Text from a complex UI (Golf Simulator)

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels