Issue
The definition of Stochastic depth was first mentioned in this paper. In short, it's similar to drop-out but instead of node, it will terminate the connection of the Skip connection structure (residual block) in ResNet paper.
My question: is there any fast, easy way to implement Stochastic depth in transfer learning by Pytorch such as drop-out (simply add torch.nn.DropOut(p)
into the classifier block).
Solution
Basic StochasticDepth
Yes, one could do something like this pretty easily:
class StochasticDepth(torch.nn.Module):
def __init__(self, module: torch.nn.Module, p: float = 0.5):
super().__init__()
if not 0 < p < 1:
raise ValueError(
"Stochastic Depth p has to be between 0 and 1 but got {}".format(p)
)
self.module: torch.nn.Module = module
self.p: float = p
self._sampler = torch.Tensor(1)
def forward(self, inputs):
if self.training and self._sampler.uniform_():
return inputs
return self.p * self.module(inputs)
Please notice that:
inputs
shape has to be the same asself.module(inputs)
shape- You can pass any block inside this function (see below)
Example usage:
layer = StochasticDepth(
torch.nn.Sequential(
torch.nn.Linear(10, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 10),
torch.nn.ReLU(),
),
p=0.5,
)
Adding to existing models
First, you should print
the model that you want and analyze the weights and outputs.
What you're looking for to apply this module easiest (in case of Conv{1,2,3}d
layers):
- same number of
in_channels
andout_channels
within the block - for different number of
in_channels
andout_channels
some kind of projection would be needed
StochasticDepth with projection
Version with projection
of StochasticDepth
:
class StochasticDepth(torch.nn.Module):
def __init__(
self,
module: torch.nn.Module,
p: float = 0.5,
projection: torch.nn.Module = None,
):
super().__init__()
if not 0 < p < 1:
raise ValueError(
"Stochastic Depth p has to be between 0 and 1 but got {}".format(p)
)
self.module: torch.nn.Module = module
self.p: float = p
self.projection: torch.nn.Module = projection
self._sampler = torch.Tensor(1)
def forward(self, inputs):
if self.training and self._sampler.uniform_():
if self.projection is not None:
return self.projection(inputs)
return inputs
return self.p * self.module(inputs)
projection
could be Conv2d(256, 512, kernel_size=1, stride=2)
in case of resnet
modules as it would increase number of channels
and make the image smaller via stride=2
as in original paper.
Applying StochasticDepth
If you print torchvision.models.resnet18()
you would see repeating blocks like this:
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
Each layer
is a larger colored block you may want to randomly skip. For resnet18
and layer
specifically one could do this:
model = torchvision.models.resnet18()
model.layer1 = StochasticDepth(model.layer1)
model.layer2 = StochasticDepth(
model.layer2,
projection=torch.nn.Conv2d(
64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
),
)
model.layer3 = StochasticDepth(
model.layer3,
projection=torch.nn.Conv2d(
128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
),
)
model.layer4 = StochasticDepth(
model.layer4,
projection=torch.nn.Conv2d(
256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
),
)
- First block's channels stay the same hence no projection is needed
- Second, third and fourth up number of channels and make the image smaller via
stride
hence simple projection is used
You can modify any part of the neural network using this approach, just remember to test whether the shapes agree.
Simpler projections
One could also tie weights between first conv layer in specific block and use that module
as a projection, see below:
model = torchvision.models.resnet18()
model.layer1 = StochasticDepth(model.layer1)
model.layer2 = StochasticDepth(model.layer2, projection=model.layer2[0].conv1)
model.layer3 = StochasticDepth(model.layer3, projection=model.layer3[0].conv1)
model.layer4 = StochasticDepth(model.layer4, projection=model.layer4[0].conv1)
Upsides:
- weights are not randomly initialized
- easier to write
Downsides:
- weights are tied and one layer will have to do two tasks:
- be first in the block (without dropping)
- be the only in the block (with dropping)
- this might not end up too well as it is responsible for conflicting tasks
You may also copy this module
instead of sharing weights, probably best idea.
Answered By - Szymon Maszke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.