Issue
According to the docs, CrossEntropyLoss
criterion combines LogSoftmax
function and NLLLoss
criterion.
That is all fine and well, but testing it doesn't seem to substantiate this claim (ie assertion fails):
model_nll = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
nn.LogSoftmax(dim=1))
model_ce = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
nn.LogSoftmax(dim=1))
loss_fn_ce = nn.CrossEntropyLoss()
loss_fn_nll = nn.NLLLoss()
t = torch.rand(1,3072)
target = torch.tensor([1])
with torch.no_grad():
loss_nll = loss_fn_nll(model_nll(t), target)
loss_ce = loss_fn_ce(model_ce(t), target)
assert torch.eq(loss_nll, loss_ce)
I'm obviously missing something basic here.
Solution
As you noticed, the weights are initialized randomly.
One way to get two modules sharing the same weights is to simply export with state_dict
the state of one and set it on the other with load_state_dict
.
This is a one-liner:
>>> model_ce.load_state_dict(model_nll.state_dict())
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.