Issue
I am using nni framework on python to do Neural Architecture Search. In that I have defined model as:
from nni.nas.pytorch import mutables
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = mutables.LayerChoice([
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=1)
]) # try 3x3 kernel and 5x5 kernel
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(14400, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x) #Here is error coming
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
What the above code does apart from building the model is it also gives a choice to below algorithm to choose between two layers as the first convolution layer, either layer with 3X3 kernel or 5X5 kernel.
Also I am new to pyTorch so let me know if you can already see a mistake in above.
Moving on, it is coupled by below code:
dataset_train = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
dataset_valid = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)
# use NAS here
def top1_accuracy(output, target):
# this is the function that computes the reward, as required by ENAS algorithm
batch_size = target.size(0)
_, predicted = torch.max(output.data, 1)
return (predicted == target).sum().item() / batch_size
def metrics_fn(output, target):
# metrics function receives output and target and computes a dict of metrics
return {"acc1": top1_accuracy(output, target)}
from nni.algorithms.nas.pytorch import enas
trainer = enas.EnasTrainer(model,
loss=criterion,
metrics=metrics_fn,
reward_function=top1_accuracy,
optimizer=optimizer,
batch_size=128,
num_epochs=10, # 10 epochs
dataset_train=dataset_train,
dataset_valid=dataset_valid,
log_frequency=10) # print log every 10 steps
trainer.train() # training
trainer.export(file="model_dir/final_architecture.json") # export the final architecture to file
What the above does is downloads and gets cifar10 dataset, uses the above generated model to train on it and finds which model performs best (based on two choices of layers, you can have more choices as well). But it raises an error:
22 x = self.dropout1(x)
23 x = torch.flatten(x, 1)
---> 24 x = self.fc1(x)
25 x = F.relu(x)
26 x = self.dropout2(x)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x12544 and 14400x128)
I know this is because the flatten layer converts it to a dimension which is not what the first fully connected layer expects. When I do convert it to what the error says, I get the below error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x14400 and 12544x128)
I believe it happens because of the choice in first convolution layer. My question is how do I fix this? And if nni or something feels not understandable to you, there is the option of just putting the dimensions of fully connected layer as number of hidden units in that layer without mentioning the input in KERAS. But I suppose pyTorch requires input dimension to be correctly put, is there a way I can just say after flatten, to go for a hidden fully connected layer with just the number of units and not the input shape as well which I believe is causing the problems?
Solution
For conv
with kernel_zise=5
you need to padding=2
and not 1.
Fix:
self.conv1 = mutables.LayerChoice([
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=1)
])
to
self.conv1 = mutables.LayerChoice([
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2) # match padding size to kernel size
])
Update:
Recent versions of pytorch allow you to specify padding='same'
and avoid the need to come up with the correct value for padding.
However, I strongly urge you to use the formula for computing the output shape of a convolution layer (found here) and manually compute the correct value for padding. This is a good sanity check to ensure you understand what you are doing.
Answered By - Shai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.