Issue
I'm trying to implement an autoencoder CNN. However, I have the following problem:
The last convolutional layer of my encoder is defined as follows:
Conv2d(128, 256, 3, padding=1, stride=2)
The input of this layer has shape (1, 128, 24, 24). Thus, the output has shape (1, 256, 12, 12).
After this layer, I have ReLU activation and BatchNorm. Neither of these changes the shape of the output.
Then I have a first ConvTranspose2d layer defined as:
ConvTranspose2d(256, 128, 3, padding=1, stride=2)
But the output of this layer has shape (1, 128, 23, 23).
As far as I know, if we use the same kernel size, stride, and padding in ConvTrapnpose2d as in the preceding Conv2d layer, then the output of this 2 layers block must have the same shape as its input.
So, my question is: what is wrong with my understanding? And how can I fix this issue?
Solution
I would first like to note that the nn.ConvTranspose2d
layer is not the inverse of nn.Conv2d
as explained in its documentation page:
it is not an actual deconvolution operation as it does not compute a true inverse of convolution
As far as I know, if we use the same kernel size, stride, and padding in
ConvTranspose2d
as in the preceding Conv2d layer, then the output of this 2 layers block must have the same shape as its input.
This is not always true! It depends on the input spatial dimensions.
In terms of spatial dimensions the 2D convolution will output:
out = [(x + 2p - d(k - 1) - 1)/s + 1]
where [x]
is the whole part of x
.
while the 2D transpose convolution will output:
out = (x - 1)s - 2p + d(k - 1) + op + 1
where x = input_dimension
, out = output_dimension
, k = kernel_size
, s = stride
, d = dilation
, p = padding
, and op = output_padding
.
If you look at the convT o conv
operator (i.e. convT(conv(x))
) then you have:
out = (out_conv - 1)s - 2p + d(k - 1) + op + 1
= ([(x + 2p - d(k - 1) - 1)/s + 1] - 1)s - 2p + d(k - 1) + op + 1
Which equals to x
only if we have [(x + 2p - d(k - 1) - 1)/s + 1] = (x + 2p - d(k - 1) - 1)/s + 1
, that is: if x
is odd, in this case:
out = ((x + 2p - d(k - 1) - 1)/s + 1 - 1)s - 2p + d(k - 1) + op + 1
= x + op
And out = x
when op = 0
.
Otherwise if x
is even then:
out = x - 1 + op
And setting op = 1
gives out = x
.
Here is an example:
>>> conv = nn.Conv2d(1, 1, 3, stride=2, padding=1)
>>> convT = nn.ConvTranspose2d(1, 1, 3, stride=2, padding=1)
>>> convT(conv(torch.rand(1, 1, 25, 25))).shape # x even
(1, 1, 25, 25) #<- out = x
>>> convT = nn.ConvTranspose2d(1, 1, 3, stride=2, padding=1, output_padding=1)
>>> convT(conv(torch.rand(1, 1, 24, 24))).shape # x odd
(1, 1, 24, 24) #<- out = x - 1 + op
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.