Issue
During training, I load image and disparity data. The image tensor is of shape: [2, 3, 256, 256], and disparity/depth tensor is of shape: [2, 1, 256, 256] (batch size, channels, height, width). I want to use Conv3D, so I need to combine these two tensors and create a new tensor of shape: [2, 3, 256, 256, 256] (batch size, channels, depth, height, width). The depth values range from 0-400, and a possibility is to divide that into intervals, e.g., 4 intervals of 100. I want the resulting tensor to look like a voxel, similarly to the technique used in this paper. The training loop that iterates over the data is below:
for batch_id, sample in enumerate(train_loader):
sample = {name: tensor.cuda() for name, tensor in sample.items()}
# image tensor [2, 3, 256, 256]
rgb_image = transforms.Lambda(lambda x: x.mul(255))(sample["frame"])
# translate disparity to depth
depth_from_disparity_frame = 132.28 / sample["disparity_frame"]
# depth tensor [2, 1, 256, 256]
depth_image = depth_from_disparity_frame.unsqueeze(1)
Solution
From the article your linked:
We create a 3D voxel representation, with the same height and width as the original image, and with a depth determined by the difference between the maximum and minimum depth values found in the images. Each RGB-D pixel of an image is then placed at the same position in the voxel grid but at its corresponding depth.
This is what Ivan suggested, more or less. If you know that your depth will always be 0-400 and I imagine that you can skip the first part of "depth determined by the difference between the maximum and minimum depth values". This could always be normalized before-hand or later.
Code using dummy data:
import torch
import torch.nn.functional as F
# Declarations (dummy tensors)
rgb_im = torch.randint(0, 255, [1, 3, 256, 256])
depth = torch.randint(0, 400, [1, 1, 256, 256])
# Calculations
depth_ohe = F.one_hot(depth, num_classes=400) # of shape (batch, channel, height, width, binary)
bchwd_tensor = rgb_im.unsqueeze(-1)*depth_ohe # of shape (batch, channel, height, width, depth)
bcdhw_tensor = bchwd_tensor.permute(0, 1, 4, 2, 3) # of shape (batch, channel, depth, height, width)
Answered By - Zoom
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.