Issue
I am reading the "Deep Learning for Coders with fastai & PyTorch" book. I'm still a bit confused as to what the Embedding module does. It seems like a short and simple network, except I can't seem to wrap my head around what Embedding does differently than Linear without a bias. I know it does some faster computational version of a dot product where one of the matrices is a one-hot encoded matrix and the other is the embedding matrix. It does this to in effect select a piece of data? Please point out where I am wrong. Here is one of the simple networks shown in the book.
class DotProduct(Module):
def __init__(self, n_users, n_movies, n_factors):
self.user_factors = Embedding(n_users, n_factors)
self.movie_factors = Embedding(n_movies, n_factors)
def forward(self, x):
users = self.user_factors(x[:,0])
movies = self.movie_factors(x[:,1])
return (users * movies).sum(dim=1)
Solution
Embedding
[...] what Embedding does differently than Linear without a bias.
Essentially everything. torch.nn.Embedding
is a lookup table; it works the same as torch.Tensor
but with a few twists (like possibility to use sparse embedding or default value at specified index).
For example:
import torch
embedding = torch.nn.Embedding(3, 4)
print(embedding.weight)
print(embedding(torch.tensor([1])))
Would output:
Parameter containing:
tensor([[ 0.1420, -0.1886, 0.6524, 0.3079],
[ 0.2620, 0.4661, 0.7936, -1.6946],
[ 0.0931, 0.3512, 0.3210, -0.5828]], requires_grad=True)
tensor([[ 0.2620, 0.4661, 0.7936, -1.6946]], grad_fn=<EmbeddingBackward>)
So we took the first row of the embedding. It does nothing more than that.
Where is it used?
Usually when we want to encode some meaning (like word2vec) for each row (e.g. words being close semantically are close in euclidean space) and possibly train them.
Linear
torch.nn.Linear
(without bias) is also a torch.Tensor
(weight) but it does operation on it (and the input) which is essentially:
output = input.matmul(weight.t())
every time you call the layer (see source code and functional definition of this layer).
Code snippet
The layer in your code snippet does this:
- creates two lookup tables in
__init__
- the layer is called with input of shape
(batch_size, 2)
:- first column contains indices of user embeddings
- second column contains indices of movie embeddings
- these embeddings are multiplied and summed returning
(batch_size,)
(so it's different fromnn.Linear
which would return(batch_size, out_features)
and perform dot product instead of element-wise multiplication followed by summation like here)
This is probably used to train both representations (of users and movies) for some recommender-like system.
Other stuff
I know it does some faster computational version of a dot product where one of the matrices is a one-hot encoded matrix and the other is the embedding matrix.
No, it doesn't. torch.nn.Embedding
can be one hot encoded and might also be sparse, but depending on the algorithms (and whether those support sparsity) there might be performance boost or not.
Answered By - Szymon Maszke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.