Wednesday, April 13, 2022

[FIXED] Apply stochastic depth by Pytorch

April 13, 2022 deep-learning, machine-learning, python, pytorch No comments

Issue

The definition of Stochastic depth was first mentioned in this paper. In short, it's similar to drop-out but instead of node, it will terminate the connection of the Skip connection structure (residual block) in ResNet paper.

My question: is there any fast, easy way to implement Stochastic depth in transfer learning by Pytorch such as drop-out (simply add torch.nn.DropOut(p) into the classifier block).

Solution

Basic StochasticDepth

Yes, one could do something like this pretty easily:

class StochasticDepth(torch.nn.Module):
    def __init__(self, module: torch.nn.Module, p: float = 0.5):
        super().__init__()
        if not 0 < p < 1:
            raise ValueError(
                "Stochastic Depth p has to be between 0 and 1 but got {}".format(p)
            )
        self.module: torch.nn.Module = module
        self.p: float = p
        self._sampler = torch.Tensor(1)

    def forward(self, inputs):
        if self.training and self._sampler.uniform_():
            return inputs
        return self.p * self.module(inputs)

Please notice that:

inputs shape has to be the same as self.module(inputs) shape
You can pass any block inside this function (see below)

Example usage:

layer = StochasticDepth(
    torch.nn.Sequential(
        torch.nn.Linear(10, 10),
        torch.nn.ReLU(),
        torch.nn.Linear(10, 10),
        torch.nn.ReLU(),
    ),
    p=0.5,
)

Adding to existing models

First, you should print the model that you want and analyze the weights and outputs.

What you're looking for to apply this module easiest (in case of Conv{1,2,3}d layers):

same number of in_channels and out_channels within the block
for different number of in_channels and out_channels some kind of projection would be needed

StochasticDepth with projection

Version with projection of StochasticDepth:

class StochasticDepth(torch.nn.Module):
    def __init__(
        self,
        module: torch.nn.Module,
        p: float = 0.5,
        projection: torch.nn.Module = None,
    ):
        super().__init__()
        if not 0 < p < 1:
            raise ValueError(
                "Stochastic Depth p has to be between 0 and 1 but got {}".format(p)
            )
        self.module: torch.nn.Module = module
        self.p: float = p
        self.projection: torch.nn.Module = projection
        self._sampler = torch.Tensor(1)

    def forward(self, inputs):
        if self.training and self._sampler.uniform_():
            if self.projection is not None:
                return self.projection(inputs)
            return inputs
        return self.p * self.module(inputs)

projection could be Conv2d(256, 512, kernel_size=1, stride=2) in case of resnet modules as it would increase number of channels and make the image smaller via stride=2 as in original paper.

Applying StochasticDepth

If you print torchvision.models.resnet18() you would see repeating blocks like this:

(layer2): Sequential(                                                                                                                                                                  
    (0): BasicBlock(                                                                                                                                                                     
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)                                                                                            
      (relu): ReLU(inplace=True)                                                                                                                                                         
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(                                                                                                                                                          
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)                                                                                                              
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )             
    )                                                                                                                                                                                    
    (1): BasicBlock(                                                                                                                                                                     
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)                                                                                            
      (relu): ReLU(inplace=True)                                                                                                                                                         
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )                                                                                       
  )

Each layer is a larger colored block you may want to randomly skip. For resnet18 and layer specifically one could do this:

model = torchvision.models.resnet18()
model.layer1 = StochasticDepth(model.layer1)
model.layer2 = StochasticDepth(
    model.layer2,
    projection=torch.nn.Conv2d(
        64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
    ),
)
model.layer3 = StochasticDepth(
    model.layer3,
    projection=torch.nn.Conv2d(
        128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
    ),
)
model.layer4 = StochasticDepth(
    model.layer4,
    projection=torch.nn.Conv2d(
        256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
    ),
)

First block's channels stay the same hence no projection is needed
Second, third and fourth up number of channels and make the image smaller via stride hence simple projection is used

You can modify any part of the neural network using this approach, just remember to test whether the shapes agree.

Simpler projections

One could also tie weights between first conv layer in specific block and use that module as a projection, see below:

model = torchvision.models.resnet18()
model.layer1 = StochasticDepth(model.layer1)
model.layer2 = StochasticDepth(model.layer2, projection=model.layer2[0].conv1)
model.layer3 = StochasticDepth(model.layer3, projection=model.layer3[0].conv1)
model.layer4 = StochasticDepth(model.layer4, projection=model.layer4[0].conv1)

Upsides:

weights are not randomly initialized
easier to write

Downsides:

weights are tied and one layer will have to do two tasks:
- be first in the block (without dropping)
- be the only in the block (with dropping)
this might not end up too well as it is responsible for conflicting tasks

You may also copy this module instead of sharing weights, probably best idea.

Answered By - Szymon Maszke

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, April 13, 2022

[FIXED] Apply stochastic depth by Pytorch

Issue

Solution

Basic StochasticDepth

Adding to existing models

StochasticDepth with projection

Applying StochasticDepth

Simpler projections

0 comments:

Post a Comment

Popular Posts

Labels