总体结构:
![](https://img.haomeiwen.com/i5300364/346c8e5e2e616391.png)
Skip-gram模型的目标函数是最大化:
![](https://img.haomeiwen.com/i5300364/717d01538b3daa8a.png)
对于Skip-gram,更大的context window 可以生成更多的训练样本,获得更精确的表达,但训练时间更长。
![](https://img.haomeiwen.com/i5300364/07317351fd0f213e.png)
Trick:
1).Hierarchical Softmax
The main advantage is that instead of evaluating W output nodes in the neural network to obtain the probability distribution, it is needed to evaluate only about log2(W) nodes.
简而言之,构造了一颗二叉树,减少运算量
2).Negative Sampling
Sorry, I can't understand
3).Subsampling of Frequent Words
以概率:
![](https://img.haomeiwen.com/i5300364/d141a858b75a65be.png)
抛弃单词,其中f是词频,t是阈值,通常为10^-5。
网友评论