Issue
I have a simple network implemented in pytorch say,
class network:
def __init__(self):
self.device = device
#these are the 3 convolutional synapses; Same convolution;
self.layer = sequential(
conv2d(3, 3, (23), padding=11),
batch_norm_2d(3),
Swish(),
conv2d(3, 3, (11), padding=5),
batch_norm_2d(3),
Swish(),
conv2d(3, 3, (5), padding=2),
batch_norm_2d(3),
Swish(),
conv2d(3, 4, (3), padding=15, stride=2),
batch_norm_2d(4),
Swish(),
conv2d(4, 8, (3), padding=15, stride=2),
batch_norm_2d(8),
Swish(),
conv2d(8, 4, (1)),
batch_norm_2d(4),
Swish(),
conv2d(4, 8, (3), padding=15, stride=2),
batch_norm_2d(8),
Swish(),
conv2d(8, 16, (3), padding=15, stride=2),
batch_norm_2d(16),
Swish(),
conv2d(16, 8, (1)),
batch_norm_2d(8),
Swish(),
conv2d(8, 16, (3), padding=15, stride=2),
batch_norm_2d(16),
Swish(),
conv2d(16, 32, (3), padding=15, stride=2),
batch_norm_2d(32),
Swish(),
conv2d(32, 16, (1)),
batch_norm_2d(16),
Swish(),
conv2d(16, 32, (3), padding=15, stride=2),
batch_norm_2d(32),
Swish(),
conv2d(32, 64, (3), padding=15, stride=2),
batch_norm_2d(64),
Swish(),
conv2d(64, 32, (1)),
batch_norm_2d(32),
Swish(),
conv2d(32, 64, (3), padding=15, stride=2),
batch_norm_2d(64),
Swish(),
conv2d(64, 128, (3), padding=15, stride=2),
batch_norm_2d(128),
Swish(),
conv2d(128, 64, (1)),
batch_norm_2d(64),
Swish(),
conv2d(64, 128, (3), padding=15, stride=2),
batch_norm_2d(128),
Swish(),
conv2d(128, 256, (3), padding=15, stride=2),
batch_norm_2d(256),
Swish(),
conv2d(256, 128, (1)),
batch_norm_2d(128),
Swish(),
flatten(1, -1),
linear(128*29*29, 8*8*2*5),
batch_norm_1d(8*8*2*5),
Swish()
)
#loss and optimizer functions for ethirun
self.Loss_1 = IoU_Loss() #the loss function for bounding box.
self.Loss_2 = tor.nn.SmoothL1Loss(reduction='mean')
#the optimizer
self.Optimizer = tor.optim.AdamW(self.parameters())#tor.optim.SGD(self.parameters(), lr=1e-2, momentum=0.9, weight_decay=1e-5, nesterov=True)
self.Scheduler = tor.optim.lr_scheduler.StepLR(self.Optimizer, 288, gamma=0.5)
self.sizes = tor.tensor(range(0, 5), dtype=tor.int64, device=self.device)
def forward(self, input):
return self.layer(input)
def backprop(self, preds, lbls, val_or_trn):
#takes predictions and labels and calculates error and backpropagates
mask = tor.index_select(lbls, -1, self.sizes[0])
preds.register_hook(lambda grad: grad * mask.float())
error = self.Loss_2(preds, lbls)
if val_or_trn == 1:
#backpropagation
error.backward()
self.Optimizer.step()
self.Scheduler.step()
#zeroing the gradients.
self.Optimizer.zero_grad()
return error.detach()
model = network()
Where the inputs, outputs and channels are arbitrary. Then say I create some random input tensor like this,
input_data = torch.randn(1, 3, 256, 256)
Then I predict some result in this data like this,
model(input_data)
And say I also change the input_data variable by initiating the torch.randn command a bunch of different times while keeping the model same. That is not re-initiating the model=network() command.
I get this error,
Expected more than 1 value per channel when training, got input size torch.Size([1, some_value])
So, I tried running it in evaluation mode by using the model.eval() function like this,
model.eval()
with tor.no_grad()
pred = model(input_data)
model.train()
This works without errors. However no matter how I change the input_data variable I always get the same value in pred. If I however re-initiate the model's parameters I get a new pred Which once again does not change with different inputs. Unless I once again re-initiate the model using model=network(). What am I doing wrong?
Edit: To give more info on my problem I'm trying to create a yolo like network from scratch. And this is the dataset I'm using https://www.kaggle.com/devdgohil/the-oxfordiiit-pet-dataset
Solution
Basically that's what the Batchnorm doing. You use Batchnorm to make training less prone to overfit but don't use batchnorm in eval so that you can get the correct result Same go for Dropout.
Every CNN model with batch normalization and/or dropout does the same. The output of the same input will be different during train and eval
Which is exactly why Pytorch has the model.eval()
. To turn these layers off during inference to get the correct output.
Edit
The problem is the activation and Batch Normalization at the output.
Only use something that will make the result similar to the ground truth. Like use sigmoid
when you want output to be in range of 0-1 or tanh
for -1 to 1 or softmax
for probability across the axis.
Imagine relu
function (which is basically the simpler version of swish
and softplus
). It will turn everything below 0 to 0. And chances are you need some output to be below 0 so your model won't converge at all.
Answered By - Natthaphon Hongcharoen
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.