I have a dataset of laser welding images of size 300*300 which contains two class of bad and good weld seam. I have followed Pytorch fine-tuning tutorial for an inception-v3 classifier.
on the other hand, I also build a custom CNN with 3 conv layer and 3 fc. What I observed is that the fine tuning showed lots of variation on validation accuracy. basically, I see different maximum accuracy every time I train my model. Plus, my accuracy in fine-tuning is much less than my custom CNN!! for example the accuracy for my synthetic images from a GAN is 86% with inception-v3, while it is 94% with my custom CNN. The real data for both network shows almost similar behaviour and accuracy, however accuracy in custom CNN is about 2% more.
I trained with different training scales of 200, 500 and 1000 train-set images (half of them for each class like for 200 images we have 100 good and 100 bad). I also include a resize transform of 224 in my train_loader; in fine tuning tutorial, this resize is automatically done to 299 for inception-v3. for each trial, the validation-size and its content is constant.
Do you know what cause this behavior? Is it because my dataset is so different from the pretrained model classes? am I not supposed to get better results with fine-tuning?
My custom CNN:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv3 = nn.Conv2d(16, 24, 5)
self.fc1 = nn.Linear(13824, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 2)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
#x = x.view(-1, 16 * 5 * 5)
x = x.view(x.size(0),-1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
#x = F.softmax(x, dim=1)
return x
model = Net()
criterion = nn.CrossEntropyLoss()
#optimizer = optim.Adam(model.parameters(), lr=0.001)
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=5e-4)
with training loop of:
epochs = 15
steps = 0
running_loss = 0
print_every = 10
train_losses, test_losses = [], []
train_acc, test_acc = [], []
for epoch in range(epochs):
for inputs, labels in trainloader:
steps += 1
inputs, labels =,
logps = model.forward(inputs)
loss = criterion(logps, labels)
running_loss += loss.item()
if steps % print_every == 0:
test_loss = 0
accuracy = 0
with torch.no_grad():
for inputs, labels in testloader:
inputs, labels =,
logps = model.forward(inputs)
batch_loss = criterion(logps, labels)
test_loss += batch_loss.item()
ps = torch.exp(logps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
print(f"Epoch {epoch+1}/{epochs}.. "
f"Train loss: {running_loss/print_every:.3f}.. "
f"Test loss: {test_loss/len(testloader):.3f}.. "
f"Test accuracy: {accuracy/len(testloader):.3f}")
running_loss = 0
Here is my theory :
Pre-training is useful when you want to leverage already existing data to help the model train on similar data, for which you have few instances. At least this was the reasoning behind the Unet architecture in medical image segmentation.
Now, to me the key is in the notion of "similar". If your network have been pre-trained on cats, dogs and you want to extrapolate to weld seam there's a chance your pre-training is not helping or even getting in the way of the model training properly.
Why ?
When training your CNN you get randomly initialized weights, whereas using a pre-trained network you get pre-trainned weights. If the features your are extracting are similar across dataset then you get a head start by having the network already attuned to this features.
For example, Cats and Dogs share similar spatial features visually (eye position, nose, ears...). So there's chance that you converge to a local minima faster during training since your are already starting from a good base that just need to adapt to the new specific of your data.
If the similarity assumptions does not hold it means your model would have to "unlearn" what he already learned to adapt to the new specifics of your dataset and I guess that would be the reason why training is more difficult and does not give as good result as a blank slate CNN. (especially if you don't have that much data).
PS : I'd be curious to see if your pre trained model end up catching up with your CNN if you give it more epochs to train.
Answered By - Yoan B. M.Sc
