Issue
I'm reading the source code of TorchVision's GoogLeNet and I found these lines strange and can't figure it out.
def _transform_input(self, x: Tensor) -> Tensor:
if self.transform_input:
x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
return x
I know that ImageNet datasets had mean = [0.485, 0.456, 0.406]
and std = [0.229, 0.224, 0.225]
and it looks like some "normalization" but it is obviously not (x - mean) / std
but more like x * std + mean
. Also I don't know about the 0.5
thing.
Anyone who can explain these code?
Solution
This was done to match TensorFlow's way of preprocessing the input image. In the pull request that added GoogLeNet to TorchVision, the author explains that he matched the processing done by TensorFlow. Here is the commit that added the normalization in the question.
the author who contributed GoogLeNet to TorchVision wrote
I've updated the code to match the structure required for the TensorFlow weights. Also added the input normalization used for the Inception v3 model.
Answered By - jkr
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.