Issue
I am working on an AI related issue, where I need to track several human bodyparts on videos. I create a DataLoader with my images and i make several transforms when calling my Dataset class .
Here is a code sample :
transform = transforms.Compose(
[
transforms.Resize(img_size),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
]
)
dataset = NamedClassDataset(annotation_folder_path=path, transform=transform, img_size=img_size, normalized=normalize)
train_set, validation_set = torch.utils.data.random_split(dataset, get_train_test_size(dataset,train_percent))
train_loader = DataLoader(dataset=train_set, shuffle=shuffle, batch_size=batch_size,num_workers=num_workers,pin_memory=pin_memory)
validation_loader = DataLoader(dataset=validation_set, shuffle=shuffle, batch_size=batch_size,num_workers=num_workers, pin_memory=pin_memory)
The problem is : after running my model, I display images with the predicted points in order to see their quality. But since images are resized and normalized, I cannot retrieve their original quality and color. I would like to display points on the original images instead of the transformed images and I want to know what is the usual way to do this.
I already have thought of two solutions with their respective disadvantages :
- Reverting transformations, but impossible when resize is called since we loose information
- Returning an index as a third argument in the
__getitem__
method of the NamedClassDataset (along with the image and labels). But pytorch methods expects only two outputs when using__getitem__
which are (image, associated labels).
EDIT : Here is the getitem of my NamedClassDataset class :
def __getitem__(self, index):
(img_path, coords) = self.annotations.iloc[index].values
img = Image.open(img_path).convert("RGB")
w,h = img.size
# Normalize by img size
if self.img_size is not None:
if self.normalized:
coords = coords/(w,h) # Normalized
else:
n_h,n_w = self.img_size
coords = coords/(w,h)*(n_w,n_h) # Not normalized
y_coords = torch.flatten(torch.tensor(coords)).float() # Flatten outputs and convert from double to float32
if self.transform is not None:
img = self.transform(img)
return (img, y_coords)
Solution
I managed to do the trick by declaring another dataset with the original images.
# Create the same dataset with untransformed images for visualization purposes
org_dataset = NamedClassDataset(annotation_folder_path="./12_labels/extracted_swimmers", transform=None, img_size=None, normalized=False)
viz_train_set, viz_validation_set = random_split(org_dataset, get_train_test_size(org_dataset,train_percent,_print_size=False), generator=torch.Generator().manual_seed(seed))
And here is what I do in the __getitem__
when transform=None
:
if self.transform is not None:
tr_img = self.transform(org_img)
return (tr_img, y_coords)
return (org_img, y_coords)
I then have access to original images by passing viz sets as parameters. Do note that this is a Dataset and not a Dataloader so you need to take in account your batch size in order to match the predictions. e.g. :
plot_predictions(viz_set[0+i*batch_size][0], preds[0])
I let the feed open since I strongly believe that a more efficient answer can be provided.
Answered By - Mrofsnart
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.