Issue
After reviewing LeNet5 architecture description, when Max Pooling layer (with 6 filters) is connected with Conv2D layer (16 filters), there is a requirement of special filter mapping, in the following way:
- Taking inputs from every contiguous subset of 3 feature maps
- Taking inputs from every contiguous subset of 4 feature maps
- Taking inputs from every contiguous subset of 4 feature maps
- Taking inputs from the discontinuous subset of 4 feature maps
- Taking all the feature maps
A slightly annotated image from LeNet paper, courtesy of TowardsAI
Here recall this mapping is pretty specialized to 10 <-> 16, whereas Tensorflow/PyTorch is pretty flexible, just wonder how Tensorflow/PyTorch handles it exactly?
Solution
I don't know about the actual implementation of this network in those frameworks. However, here's one way you can implement such an operation in PyTorch.
You can look at this operation as a change of basis: going from feature maps in the 'S2' space to feature maps in the 'C3' space using a transform matrix M
. The whole objective is to construct that matrix, it is composed of ones and zeros, where the ones are positioned such that you construct vectors in C3 space using components of vectors in S2 space.
For instance, let's look at the discontinuous subsets of 4 of the table: column #12
requires maps n°0
, 1
, 3
, and 4
. The corresponding row in M
for vector #12
will therefore be [1,1,0,1,1,0]
. Essentially, the 1
s here correspond to the crosses shown in the figure. For this particular portion of the transition M
will look like:
tensor([[1., 0., 1.],
[1., 1., 0.],
[0., 1., 1.],
[1., 0., 1.],
[1., 1., 0.],
[0., 1., 1.]])
To actually perform the matrix multiplication, you can use torch.einsum
:
torch.einsum('bchw,cd->bdhw', x, M)
Here's an example: starting from a 6-channel 2x2 map and transitioning to a 3-channel 2x2 map (defined by columns #12
, #13
, and #14
of the Table I):
>>> x = torch.rand(1,6,2,2)
tensor([[[[0.3134, 0.2468],
[0.2759, 0.4971]],
[[0.4150, 0.8735],
[0.6726, 0.0463]],
[[0.9547, 0.5338],
[0.0654, 0.7458]],
[[0.4099, 0.1984],
[0.0930, 0.8054]],
[[0.1695, 0.1586],
[0.7961, 0.3894]],
[[0.5535, 0.0678],
[0.1484, 0.7735]]]])
>>> torch.einsum('bchw,cd->bdhw', x, M)
tensor([[[[1.3077, 1.4773],
[1.8377, 1.7382]],
[[2.0926, 1.6338],
[1.6825, 1.9550]],
[[2.2315, 1.0467],
[0.5828, 2.8219]]]])
You can of course expand this operation to the whole of Table I, this would result in a matrix M
of size 6x16
.
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.