Issue
I read the codes in pytorch tutorial resently and I find a interesting thing below:
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters(): **# 1**
param.requires_grad = False **# 1**
# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9) **# 2**
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)`
I just wonder what's the difference between # 1 and # 2, if I set #1, Can I just set #2 to code like this?: optimizer_ft = optim.SGD(model_ft.parameters(), lr=1e-3, momentum=0.9, weight_decay=0.1) or if I just delete # 1 and leave # 2 alone?
I just wonder what is the difference between #1 and #2...
Solution
Yes, if you set #1 the code for #2 could go like this optimizer_ft = optim.SGD(model_ft.parameters(), lr=1e-3, momentum=0.9, weight_decay=0.1)
it will automatically get which parameters to set gradient True for.
See here: https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
This helper function sets the .requires_grad attribute of the parameters in the model to False when we are feature extracting. By default, when we load a pretrained model all of the parameters have .requires_grad=True, which is fine if we are training from scratch or finetuning. However, if we are feature extracting and only want to compute gradients for the newly initialized layer then we want all of the other parameters to not require gradients. This will make more sense later.
def set_parameter_requires_grad(model, feature_extracting): if feature_extracting: for param in model.parameters(): param.requires_grad = False
For
if I just delete # 1 and leave # 2 alone?
You could do that too but imagine if you had to finetune multiple layers in that case it would be redundant to use model_conv.new_layer.parameters()
for every new layer so the first way that you said and used seems a better way to do it in that case.
Answered By - Mahmood Hussain
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.