
组件1:

组件2:

Transformer 模型搭建:

encoder网络主要由self-attention构成:

decoder网络:


Bert(Bidirectional Encoder Representations from Transformers):
预训练transformer网络的encoder部分,
Bert的任务1:predict the masked word,随机掩盖句子中的词,让网络预测被mask掉的词;
任务2:预测两句话是否相邻

combining the two tasks:


bert的好处:数据无需人工标记,可以利用海量数据;
【参考文献】:
- https://www.youtube.com/watch?v=aButdUV0dxI&list=PLvOO0btloRntpSWSxFbwPIjIum3Ub4GSC
- 课件:https://github.com/wangshusen/DeepLearning
- 论文: Transformer(attention is alll you need)
网友评论