Issue
I made the MNIST images which are 28x28 pixel images into tensors with
dataset = MNIST(root='data/', train=True, transform=transforms.ToTensor())
and when I run
img_tensor, label = dataset[0]
print(img_tensor.shape, label)
It says the shape is torch.Size([1, 28, 28])
.
Why is it a 1x28x28? What does the first dimension mean? and what is the point of a 1x28x28 opposed to 28x28?
Solution
An image seen as a matrix has always 3 dimensions: channels, width and height. 28
and 28
are width and height of course. The 1
in this case is the channel. So what's the channel? Every pixel is represented by three colors: red, blue and green. For each color, you will have one color-channel, so normally 3 (RGB). This makes a pictures dimension (3, W, H). So why do you have a 1 there? Because the MNIST images are black and white and therefore dont need three different color-channel to represent the final color, one channel is enough, therefore for black and white images you dimension is (1, W, H).
Here is a picture below to visualize the dimensions:
source: https://commons.wikimedia.org/wiki/File:RGB_channels_separation.png
So you see, for black and white images you only need one channel. Normally you could ignore the 1 dimension, but pytorch demands the channel dimension.
Answered By - Theodor Peifer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.