Issue
I've gone through the official doc. I'm having a hard time understanding what this function is used for and how it works. Can someone explain this in Layman terms?
I get an error for the example they provide, although the Pytorch version I'm using matches the documentation. Perhaps fixing the error, which I did, is supposed to teach me something? The snippet given in the documentation is:
fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2))
input = torch.randn(1, 3 * 2 * 2, 1)
output = fold(input)
output.size()
and the fixed snippet is:
fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2))
input = torch.randn(1, 3 * 2 * 2, 3 * 2 * 2)
output = fold(input)
output.size()
Thanks!
Solution
The unfold
and fold
are used to facilitate "sliding window" operations (like convolutions). Suppose you want to apply a function foo
to every 5x5
window in a feature map/image:
from torch.nn import functional as f
windows = f.unfold(x, kernel_size=5)
Now windows
has size
of batch-(55x.size(1)
)-num_windows, you can apply foo
on windows
:
processed = foo(windows)
Now you need to "fold" processed
back to the original size of x
:
out = f.fold(processed, x.shape[-2:], kernel_size=5)
You need to take care of padding
, and kernel_size
that may affect your ability to "fold" back processed
to the size of x
. Moreover, fold
sums over overlapping elements, so you might want to divide the output of fold
by patch size.
Answered By - Shai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.