Issue
In the torch.optim
documentation, it is stated that model parameters can be grouped and optimized with different optimization hyperparameters. It says that
For example, this is very useful when one wants to specify per-layer learning rates:
optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)
This means that
model.base
’s parameters will use the default learning rate of1e-2
,model.classifier
’s parameters will use a learning rate of1e-3
, and a momentum of0.9
will be used for all parameters.
I was wondering how to define such groups that have parameters()
attribute. What came to my mind was something in the form of
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.base()
self.classifier()
self.relu = nn.ReLU()
def base(self):
self.fc1 = nn.Linear(1, 512)
self.fc2 = nn.Linear(512, 264)
def classifier(self):
self.fc3 = nn.Linear(264, 128)
self.fc4 = nn.Linear(128, 964)
def forward(self, y0):
y1 = self.relu(self.fc1(y0))
y2 = self.relu(self.fc2(y1))
y3 = self.relu(self.fc3(y2))
return self.fc4(y3)
How should I modify the snippet above to be able to get model.base.parameters()
? Is the only way to define a nn.ParameterList
and explicitly add weight
s and bias
es of the desired layers to that list? What is the best practice?
Solution
I will show three approaches to solving this. In the end though, it comes down to personal preference.
- Grouping parameters with nn.ModuleDict
.
I noticed here an answer using nn.Sequential
to group the layers which allow to
target different sections of the model using the parameters
attribute of nn.Sequential
. Indeed base
and classifier might be more than sequential layers. I believe a more general approach is to leave the module as is, but instead, initialize an additional nn.ModuleDict
module which will contain all parameters ordered by the optimization group in separate nn.ModuleList
s:
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(1, 512)
self.fc2 = nn.Linear(512, 264)
self.fc3 = nn.Linear(264, 128)
self.fc4 = nn.Linear(128, 964)
self.params = nn.ModuleDict({
'base': nn.ModuleList([self.fc1, self.fc2]),
'classifier': nn.ModuleList([self.fc3, self.fc4])})
def forward(self, y0):
y1 = self.relu(self.fc1(y0))
y2 = self.relu(self.fc2(y1))
y3 = self.relu(self.fc3(y2))
return self.fc4(y3)
Then you can define your optimizer with:
optim.SGD([
{'params': model.params.base.parameters()},
{'params': model.params.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
Do note MyModel
's parameters
' generator won't contain duplicate parameters.
- Creating an interface for accessing parameter groups.
A different solution is to provide an interface in the nn.Module
to separate the parameters into groups:
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(1, 512)
self.fc2 = nn.Linear(512, 264)
self.fc3 = nn.Linear(264, 128)
self.fc4 = nn.Linear(128, 964)
def forward(self, y0):
y1 = self.relu(self.fc1(y0))
y2 = self.relu(self.fc2(y1))
y3 = self.relu(self.fc3(y2))
return self.fc4(y3)
def base_params(self):
return chain(m.parameters() for m in [self.fc1, self.fc2])
def classifier_params(self):
return chain(m.parameters() for m in [self.fc3, self.fc4])
Having imported itertools.chain
as chain
.
Then define your optimizer with:
optim.SGD([
{'params': model.base_params()},
{'params': model.classifier_params(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
- Using child nn.Module
s.
Lastly, you can define your module sections as submodules (here it comes down as the method as the nn.Sequential
one, yet you can generalize this to any submodules).
class Base(nn.Sequential):
def __init__(self):
super().__init__(nn.Linear(1, 512),
nn.ReLU(),
nn.Linear(512, 264),
nn.ReLU())
class Classifier(nn.Sequential):
def __init__(self):
super().__init__(nn.Linear(264, 128),
nn.ReLU(),
nn.Linear(128, 964))
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.base = Base()
self.classifier = Classifier()
def forward(self, y0):
features = self.base(y0)
out = self.classifier(features)
return out
Here again you can use the same interface as the first method:
optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
I would argue this is the best practice. However, it forces you to define each of your components into separate nn.Module
, which can be a hassle when experimenting with more complex models.
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.