Issue
I am trying to implement the U-NET architecture for image segmentation while implementing the crop and concatenation step in the expansive path, I am unable to understand how the unequal number of channels are concatenated.
According to the architecture, the input from the first upsampling step has to be concatenated from the corresponding output from contracting path but the problem is number of channels in contracting path is 512 while after upsampling step they are 1024, how they are supposed to be concatenated.My code for the crop and concatenate is -
def crop_and_concat(self, upsampled, bypass, crop=False):
if crop:
c = (bypass.size()[2] - upsampled.size()[2]) // 2
bypass = F.pad(bypass, (-c, -c, -c, -c))
return torch.cat((upsampled, bypass), 1)
The error I am receiving-
RuntimeError: Given groups=1, weight of size 128 256 5 5, expected input[4, 384, 64, 64] to have 256 channels, but got 384 channels instead
Where I am doing wrong?
Solution
First of all, you don't have to be so strict when it comes to U-Net like architectures, there were many derivatives afterwards (see for example fastai variation with PixelShuffle).
In the case of encoder, in the basic version, your channels go (per block):
1 - 64 - 128 - 256 - 512
Standard convolutional encoder.
After that is a shared layer of 1024
.
In decoder, it goes downwards, but has more channels as you are concatenating encoder states from each block.
It would be:
1024 -> 512 -> 512 (decoder) + 512 (encoder), 1024 total -> 512
512 -> 256 -> 256 (decoder) + 256 (encoder), 512 total -> 256
and so on.
You were at the case where 256
from decoder was taken in the account, but 128
added from encoder wasn't. Just up your channels to 256 + 128 and follow the above scheme for each block of your UNet.
Answered By - Szymon Maszke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.