Issue
I'm trying to warp a frame from view1 to view2 using ground truth depth map, pose information, and camera matrix. I've been able to remove most of the for-loops and vectorize it, except one for-loop. When warping, multiple pixels in view1 may get mapped to a single location in view2, due to occlusions. In this case, I need to pick the pixel with the lowest depth value (foreground object). I'm not able to vectorize this part of the code. Any help to vectorize this for loop is appreciated.
Context:
I'm trying to warp an image into a new view, given ground truth pose, depth, and camera matrix. After computing warped locations, I'm rounding them off. Any suggestions to implement inverse bilinear interpolation are also welcome. My images are of full HD resolution. Hence it is taking a lot of time to warp the frames to the new view. If I can vectorize, I'm planning to convert the code to TensorFlow or PyTorch and run it on a GPU. Any other suggestions to speed up warping, or existing implementations are also welcome.Code:
def warp_frame_04(frame1: numpy.ndarray, depth: numpy.ndarray, intrinsic: numpy.ndarray, transformation1: numpy.ndarray,
transformation2: numpy.ndarray, convert_to_uint: bool = True, verbose_log: bool = True):
"""
Vectorized Forward warping. Nearest Neighbor.
Offset requirement of warp_frame_03() overcome.
mask: 1 if pixel found, 0 if no pixel found
Drawback: Nearest neighbor, collision resolving not vectorized
"""
height, width, _ = frame1.shape
assert depth.shape == (height, width)
transformation = numpy.matmul(transformation2, numpy.linalg.inv(transformation1))
y1d = numpy.array(range(height))
x1d = numpy.array(range(width))
x2d, y2d = numpy.meshgrid(x1d, y1d)
ones_2d = numpy.ones(shape=(height, width))
ones_4d = ones_2d[:, :, None, None]
pos_vectors_homo = numpy.stack([x2d, y2d, ones_2d], axis=2)[:, :, :, None]
intrinsic_inv = numpy.linalg.inv(intrinsic)
intrinsic_4d = intrinsic[None, None]
intrinsic_inv_4d = intrinsic_inv[None, None]
depth_4d = depth[:, :, None, None]
trans_4d = transformation[None, None]
unnormalized_pos = numpy.matmul(intrinsic_inv_4d, pos_vectors_homo)
world_points = depth_4d * unnormalized_pos
world_points_homo = numpy.concatenate([world_points, ones_4d], axis=2)
trans_world_homo = numpy.matmul(trans_4d, world_points_homo)
trans_world = trans_world_homo[:, :, :3]
trans_norm_points = numpy.matmul(intrinsic_4d, trans_world)
trans_pos = trans_norm_points[:, :, :2, 0] / trans_norm_points[:, :, 2:3, 0]
trans_pos_int = numpy.round(trans_pos).astype('int')
# Solve occlusions
a = trans_pos_int.reshape(-1, 2)
d = depth.ravel()
b = numpy.unique(a, axis=0, return_index=True, return_counts=True)
collision_indices = b[1][b[2] >= 2] # Unique indices which are involved in collision
for c1 in tqdm(collision_indices, disable=not verbose_log):
cl = a[c1].copy() # Collision Location
ci = numpy.where((a[:, 0] == cl[0]) & (a[:, 1] == cl[1]))[0] # Colliding Indices: Indices colliding for cl
cci = ci[numpy.argmin(d[ci])] # Closest Collision Index: Index of the nearest point among ci
a[ci] = [-1, -1]
a[cci] = cl
trans_pos_solved = a.reshape(height, width, 2)
# Offset both axes by 1 and set any out of frame motion to edge. Then crop 1-pixel thick edge
trans_pos_offset = trans_pos_solved + 1
trans_pos_offset[:, :, 0] = numpy.clip(trans_pos_offset[:, :, 0], a_min=0, a_max=width + 1)
trans_pos_offset[:, :, 1] = numpy.clip(trans_pos_offset[:, :, 1], a_min=0, a_max=height + 1)
warped_image = numpy.ones(shape=(height + 2, width + 2, 3)) * numpy.nan
warped_image[trans_pos_offset[:, :, 1], trans_pos_offset[:, :, 0]] = frame1
cropped_warped_image = warped_image[1:-1, 1:-1]
mask = numpy.isfinite(cropped_warped_image)
cropped_warped_image[~mask] = 0
if convert_to_uint:
final_warped_image = cropped_warped_image.astype('uint8')
else:
final_warped_image = cropped_warped_image
mask = mask[:, :, 0]
return final_warped_image, mask
Code Explanation
- I'm using the equations[1,2] to get pixel locations in view2
- Once I have the pixel locations, I need to figure out if there are any occlusions and if so, I have to pick the foreground pixels.
- `b = numpy.unique(a, axis=0, return_index=True, return_counts=True)` gives me unique locations.
- If multiple pixels from view1 map to a single pixel in view2 (collision), `return_counts` will give a value greater than 1.
- `collision_indices = b[1][b[2] >= 2]` gives indices which are involved in collision. Note that this gives only one index per collision.
- For each of such collision points, `ci = numpy.where((a[:, 0] == cl[0]) & (a[:, 1] == cl[1]))[0]` provides indices of all pixels from view1 which map to the same point in view2.
- `cci = ci[numpy.argmin(d[ci])]` gives the pixel index with lowest depth value.
- `a[ci] = [-1, -1]` and `a[cci] = cl` maps all other background pixels to location (-1,-1) which is out of frame and hence will be ignored.
[1] https://i.stack.imgur.com/s1D9t.png
[2] https://dsp.stackexchange.com/q/69890/32876
Solution
I implemented this as follows. Instead of picking the nearest point (min), I used soft-min, i.e. I took the weighted average of all the colliding points where I made sure that a small difference in depth leads to a large difference in weights and that the nearest depth has the highest weight. I implemented the sum (in soft-min) using np.add.at
as suggested here.
I was able to further port it to PyTorch using torch.Tensor.index_put_
as suggested here. Finally, I replaced the rounding-off (nearest neighbour interpolation) with bilinear splatting (inverse bilinear interpolation). Both numpy and torch implementations are available here.
Answered By - Nagabhushan S N
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.