Issue
I have trained a Mask RCNN network for instance segmentation of apples. I am able to load the weights and generate predictions for my test images. The masks being generated seem to be in the correct location, but the mask itself has no real form.. it just looks like a bunch of pixels
Training is done based on the dataset from this paper, and here is the github link to code being used to train and generate weights
code for prediction is as follows. (i have omitted the parts where i create path variables and assign the paths)
import os
import glob
import numpy as np
import pandas as pd
import cv2 as cv
import fileinput
import torch
import torch.utils.data
import torchvision
from data.apple_dataset import AppleDataset
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
import utility.utils as utils
import utility.transforms as T
from PIL import Image
from matplotlib import pyplot as plt
%matplotlib inline
def get_transform(train):
transforms = []
transforms.append(T.ToTensor())
if train:
transforms.append(T.RandomHorizontalFlip(0.5))
return T.Compose(transforms)
def get_maskrcnn_model_instance(num_classes):
# load an instance segmentation model pre-trained pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# now get the number of input features for the mask classifier
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256
# and replace the mask predictor with a new one
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, hidden_layer, num_classes)
return model
num_classes = 2
device = torch.device('cpu')
model = get_maskrcnn_model_instance(num_classes)
checkpoint = torch.load('model_49.pth', map_location=device)
model.load_state_dict(checkpoint['model'], strict=False)
dataset_test = AppleDataset(test_image_files_path, get_transform(train=False))
img, _ = dataset_test[1]
model.eval()
with torch.no_grad():
prediction = model([img.to(device)])
prediction
Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())
(unable to load image here since its over 2MB.
Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())
Here is an Imgur link to the original image.. below is the predicted mask for one of the instances
Also, could you please help me understand the data structure of the generated prediction matrix shown below.. How do i access the masks so as to generate a single image with all masks displayed???
[{'boxes': tensor([[ 966.8143, 1633.7491, 1106.7389, 1787.6367],
[1418.7872, 1467.0619, 1732.0828, 1796.1527],
[1608.0396, 2064.6482, 1710.7534, 2206.5535],
[2326.3750, 1690.3418, 2542.2112, 1883.2626],
[2213.2024, 1864.3657, 2299.8933, 1963.0178],
[1112.9083, 1732.5953, 1236.7600, 1823.0170],
[1150.8256, 614.0334, 1218.8584, 711.4094],
[ 942.7086, 794.6043, 1138.2318, 1008.0430],
[1065.4371, 723.0493, 1192.7570, 870.3763],
[1002.3103, 883.4616, 1146.9994, 1006.6841],
[1315.2816, 1680.8625, 1531.3210, 1989.3317],
[1244.5769, 1925.0903, 1459.5417, 2175.3252],
[1725.2191, 2082.6187, 1934.0227, 2274.2952],
[ 936.3065, 1554.3765, 1014.2722, 1659.4229],
[ 934.8851, 1541.3331, 1090.4736, 1657.3751],
[2486.0120, 776.4577, 2547.2329, 847.9725],
[2336.1675, 698.6327, 2508.6492, 921.4550],
[2368.4077, 1954.1102, 2448.4004, 2049.5796],
[1899.1403, 1775.2371, 2035.7561, 1962.6923],
[2176.0664, 1075.1553, 2398.6084, 1267.2555],
[2274.8899, 641.6769, 2395.9634, 791.3353],
[2535.1580, 874.4780, 2642.8213, 966.4614],
[2183.4236, 619.9688, 2288.5676, 758.6825],
[2183.9832, 1122.9382, 2334.9583, 1263.3226],
[1135.7822, 779.0529, 1225.9871, 890.0135],
[ 317.3954, 1328.6995, 397.3900, 1467.7740],
[ 945.4811, 1833.3708, 997.2318, 1878.8607],
[1992.4447, 679.4969, 2134.6667, 835.8701],
[1098.5416, 1452.7799, 1429.1808, 1771.4460],
[1657.3193, 1405.5405, 1781.6273, 1574.6780],
[1443.8911, 1747.1544, 1739.0361, 2076.9724],
[1092.6003, 1165.3340, 1206.0881, 1383.8314],
[2466.4170, 1945.5931, 2555.1931, 2039.8368],
[2561.8508, 1616.2659, 2672.1033, 1742.2332],
[1894.4806, 907.9214, 2097.1875, 1182.6473],
[2321.5005, 1701.3344, 2368.3699, 1865.3914],
[2180.0781, 567.5969, 2344.6357, 763.4360],
[1845.7612, 668.6808, 2045.2688, 899.8501],
[1858.9216, 2145.7097, 1961.8870, 2273.5088],
[ 261.4607, 1314.0154, 396.9288, 1486.9498],
[2488.1682, 1585.2357, 2669.0178, 1794.9926],
[2696.9548, 936.0087, 2802.7961, 1025.2294],
[1593.6837, 1489.8641, 1720.3124, 1627.8135],
[2517.9468, 857.1713, 2567.1125, 929.4335],
[1943.2167, 636.3422, 2151.4419, 853.8924],
[2143.5664, 1100.0521, 2308.1570, 1290.7125],
[2140.9231, 1947.9692, 2238.6956, 2000.6249],
[1461.6316, 2105.2593, 1559.7675, 2189.0264],
[2114.0781, 374.8153, 2222.8838, 559.9851],
[2350.5320, 726.5779, 2466.8140, 878.2617]]),
'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1]),
'scores': tensor([0.9916, 0.9841, 0.9669, 0.9337, 0.9118, 0.7729, 0.7202, 0.7193, 0.6928,
0.6872, 0.6690, 0.5913, 0.4877, 0.4683, 0.3781, 0.3327, 0.3164, 0.2364,
0.1696, 0.1692, 0.1502, 0.1365, 0.1316, 0.1171, 0.1119, 0.1094, 0.1041,
0.0865, 0.0853, 0.0835, 0.0822, 0.0816, 0.0797, 0.0796, 0.0788, 0.0780,
0.0757, 0.0736, 0.0736, 0.0689, 0.0681, 0.0644, 0.0642, 0.0630, 0.0612,
0.0598, 0.0563, 0.0531, 0.0525, 0.0522]),
'masks': tensor([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
...,
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]]])}]
Solution
The prediction from the Mask R-CNN has the following structure:
During inference, the model requires only the input tensors, and returns the post-processed predictions as a
List[Dict[Tensor]]
, one for each input image. The fields of theDict
are as follows:
boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with values between 0 and H and 0 and W
labels (Int64Tensor[N]): the predicted labels for each image
scores (Tensor[N]): the scores or each prediction
masks (UInt8Tensor[N, 1, H, W]): the predicted masks for each instance, in 0-1 range.
You can use OpenCV's findContours
and drawContours
functions to draw masks as follows:
img_cv = cv2.imread('input.jpg', cv2.COLOR_BGR2RGB)
for i in range(len(prediction[0]['masks'])):
# iterate over masks
mask = prediction[0]['masks'][i, 0]
mask = mask.mul(255).byte().cpu().numpy()
contours, _ = cv2.findContours(
mask.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
cv2.drawContours(img_cv, contours, -1, (255, 0, 0), 2, cv2.LINE_AA)
cv2.imshow('img output', img_cv)
Sample output:
Answered By - kHarshit
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.