Issue
I'd like to tie the embedding layers between two parts of my neural network: one which embeds tokens where order matters (i.e., nn.Embedding
) and one which embeds tokens where order doesn't matter (i.e., nn.EmbeddingBag
). I ran into numerical stability issues when creating my own EmbeddingBag-like object and doing the reduction myself so I'd like to use the officially support nn.EmbeddingBag
; however, it seems like my attempt to tie weights (below) doesn't work
#!/usr/bin/env python3
import torch
import torch.nn as nn
if __name__ == "__main__":
V, max_seq, padding_idx, emb_dim, B = 10, 100, 1, 512, 32
# Create an embedding layer and initialize an embeddingbag with those weights
emb_layer = nn.Embedding(V, emb_dim, padding_idx=padding_idx)
emb_bag = nn.EmbeddingBag.from_pretrained(emb_layer.weight, freeze=False, padding_idx=padding_idx)
tokens = torch.randint(0, V, (B, max_seq))
y = torch.randn((B, emb_dim))
loss = nn.MSELoss()
# backprop through the embedding bag
y_ = emb_bag(tokens)
l = loss(y_, y)
assert emb_bag.weight.grad is None
l.backward()
assert emb_bag.weight.grad is not None
# if we're tying weights, backpropping through the emb bag should
# yield the same gradients in the embedding layer, but... the following assertion fails
assert emb_layer.weight.grad is not None and \
torch.allclose(emb_bag.weight.grad, emb_layer.weight.grad)
Is there some way to tie the weights in both or do I need to be creative with how I emulate the embeddingbag behavior?
Solution
from_pretrained()
will copy the weights: emb_layer.weight is emb_bag.weight
will be False
.
You can just set the weight attribute directly:
emb_bag.weight = emb_layer.weight
Answered By - Soul Donut
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.