Issue
I am confused on how to replicate Keras (TensorFlow) convolutions in PyTorch.
In Keras, I can do something like this. (the input size is (256, 237, 1, 21)
and the output size is (256, 237, 1, 1024)
.
import tensorflow as tf
x = tf.random.normal((256,237,1,21))
y = tf.keras.layers.Conv1D(filters=1024, kernel_size=5,padding="same")(x)
print(y.shape)
(256, 237, 1, 1024)
However, in PyTorch, when I try to do the same thing I get a different output size:
import torch.nn as nn
x = torch.randn(256,237,1,21)
m = nn.Conv1d(in_channels=237, out_channels=1024, kernel_size=(1,5))
y = m(x)
print(y.shape)
torch.Size([256, 1024, 1, 17])
I want PyTorch to give me the same output size that Keras does:
This previous question seems to imply that Keras filters are PyTorch's out_channels
but thats what I have. I tried to add the padding in PyTorch of padding=(0,503)
but that gives me torch.Size([256, 1024, 1, 1023])
but that still not correct. This also takes so much longer than keras does so I feel that I have incorrectly assigned a parameter.
How can I replicate what Keras did with convolution in PyTorch?
Solution
In TensorFlow, tf.keras.layers.Conv1D
takes in a tensor of shape (batch_shape + (steps, input_dim))
. Which means that what is commonly known as channels appears on the last axis. For instance in 2D convolution you would have (batch, height, width, channels)
. This is different from PyTorch where the channel dimension is right after the batch axis: torch.nn.Conv1d
takes in shapes of (batch, channel, length)
. So you will need to permute two axes.
For torch.nn.Conv1d
:
in_channels
is the number of channels in the input tensorout_channels
is the number of filters, i.e. the number of channels the output will havestride
the step size of the convolutionpadding
the zero-padding added to both sides
In PyTorch there is no option for padding='same'
, you will need to choose padding
correctly. Here stride=1
, so padding
must equal to kernel_size//2
(i.e. padding=2
) in order to maintain the length of the tensor.
In your example, since x
has a shape of (256, 237, 1, 21)
, in TensorFlow's terminology it will be considered as an input with:
- a batch shape of
(256, 237)
, steps=1
, so the length of your 1D input is1
,21
input channels.
Whereas in PyTorch, x
of shape (256, 237, 1, 21)
would be:
- batch shape of
(256, 237)
, 1
input channel- a length of
21
.
Have kept the input in both examples below (TensorFlow vs. PyTorch) as x.shape=(256, 237, 21)
assuming 256
is the batch size, 237
is the length of the input sequence, and 21 is the number of channels (i.e. the input dimension, what I see as the dimension on each timestep).
In TensorFlow:
>>> x = tf.random.normal((256, 237, 21))
>>> m = tf.keras.layers.Conv1D(filters=1024, kernel_size=5, padding="same")
>>> y = m(x)
>>> y.shape
TensorShape([256, 237, 1024])
In PyTorch:
>>> x = torch.randn(256, 237, 21)
>>> m = nn.Conv1d(in_channels=21, out_channels=1024, kernel_size=5, padding=2)
>>> y = m(x.permute(0, 2, 1))
>>> y.permute(0, 2, 1).shape
torch.Size([256, 237, 1024])
So in the latter, you would simply work with x = torch.randn(256, 21, 237)
...
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.