Issue
To my understanding, the output of the skip-gram model must be compared with many training labels (depending on the window size)
My question is: Does the final output of the skip-gram model look like the description in this picture?
Ps. the most similar question I can find:[1]What does the multiple outputs in skip-gram mean?
Solution
It's hard to answer about what "should" happen in degenerate/toy/artificial cases, especially given how much randomness is used in the actual initialization/training.
Both the model's internal weights and the 'projection layer' (aka 'input vectors' or just 'word vectors') are changed by backpropagation. So it can't be answered what the internal-weights should be without also considering the intialization & updates to projection-weights. And nothing is meaningful with only two training examples, as opposed to "many many more examples than coudl be approximated by the model's state".
If you think you've constructed a tiny case that's informative when run, I'd suggest trying it against actual implementations to see what happens.
But beware: tiny models & training sets are likely to be weird, or allow for multiple/overfit/idiosyncratic end-states, in ways that don't reveal much about how the algorithm behaves when used in its intended fashion – on large varied amounts of training data.
Answered By - gojomo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.