第一周 循环序列模型
![](https://img.haomeiwen.com/i14831306/439f9333d005000a.png)
![](https://img.haomeiwen.com/i14831306/1f0a4f709a529c65.png)
目前还是使用one-hot的方法来做~
![](https://img.haomeiwen.com/i14831306/4a8f6d049713d2d5.png)
为什么不适用标准的神经网络呢?原因有以下几点:
![](https://img.haomeiwen.com/i14831306/542dd361cada93cb.png)
它不仅根据现在的输入决定输出,而且根据之前的输入决定输出。但是这样的单向只利用了之前的信息,没利用后面的信息。所以,会在后面的双向RNN中介绍~
![](https://img.haomeiwen.com/i14831306/05268a5929c0cfbe.png)
前向传播
![](https://img.haomeiwen.com/i14831306/b0fd7aefd6d3c8e2.png)
![](https://img.haomeiwen.com/i14831306/b0f59f1b20d95faa.png)
反向传播:
![](https://img.haomeiwen.com/i14831306/90d05a9bdceedc3e.png)
有很对序列模型 输入和输出维度不一样,
![](https://img.haomeiwen.com/i14831306/b57c95cd3fb4f495.png)
多对多的里面机器翻译,提到了Attention模型,还有上周师兄讲过的transformer模型
![](https://img.haomeiwen.com/i14831306/934b14e56fae50e8.png)
![](https://img.haomeiwen.com/i14831306/0a32bfa0e3248b4f.png)
用RNN建立一个语言模型,a little abstract:
![](https://img.haomeiwen.com/i14831306/a188f7b28c49c4ca.png)
![](https://img.haomeiwen.com/i14831306/5fb969efb422bd4e.png)
![](https://img.haomeiwen.com/i14831306/df7811ef6afd6be2.png)
采样 从RNN中随机选择
![](https://img.haomeiwen.com/i14831306/773bc3a470adc5e9.png)
还可以用基于字符的语言模型。优点是不会出现mau这种UNK词汇,缺点是计算负担重,难以捕捉词汇间的关系
![](https://img.haomeiwen.com/i14831306/04aab60d8037b505.png)
如何解决梯度消失问题,还有梯度爆炸问题也存在。深层的RNN难以捕捉深层次的长期依赖
![](https://img.haomeiwen.com/i14831306/8c81e535cacbeed2.png)
GRU(gated recurrent unit)
![](https://img.haomeiwen.com/i14831306/15c798af3966b955.png)
可以更好捕捉长期依赖~
![](https://img.haomeiwen.com/i14831306/bb5574f844f4ceaa.png)
![](https://img.haomeiwen.com/i14831306/0f118440e26aaa61.png)
LSTM(long short term memory)长短期记忆网络【闲时看了下Wang的个人主页,发现人与人真是云泥之别,不再确定自己是否真的适合这一行。哎。】增加了遗忘门和更新门,并且和GRU不同的是,a和c视作是不同的值。
GRU:结构更简单点,更能够创建大点的网络,计算更大 LSTM更强大灵活
![](https://img.haomeiwen.com/i14831306/f958ecd636986dc0.png)
![](https://img.haomeiwen.com/i14831306/82d1d919cd98620a.png)
双向循环神经网络
![](https://img.haomeiwen.com/i14831306/da5f73adb86843e0.png)
Deep RNNs:一般三层就OK了
![](https://img.haomeiwen.com/i14831306/90cd42e9314d4866.png)
第二周 自然语言处理与词嵌入
Word representation:one-hot,使用这样的方法不能捕捉词和词之间的联系,因为它们内积都是0。使用词嵌入可以用特征来表示两个词,可以使用TSNE来将300维的向量可视化。
![](https://img.haomeiwen.com/i14831306/2596e1f5a7b48536.png)
![](https://img.haomeiwen.com/i14831306/eedc5817efc4a8ab.png)
![](https://img.haomeiwen.com/i14831306/6d79833d859553f8.png)
NLP and word embedding:可以使用迁移学习,把网上学好的词典的数据拿来用
![](https://img.haomeiwen.com/i14831306/cef5dea294966b7f.png)
![](https://img.haomeiwen.com/i14831306/25b648425fb334e1.png)
和图像识别的不同:图像可以是未知的图片,而词向量是有固定的词表的
![](https://img.haomeiwen.com/i14831306/21338fb00287c906.png)
词嵌入的特性:实现类比推理。计算向量距离
![](https://img.haomeiwen.com/i14831306/832149a553e852ec.png)
![](https://img.haomeiwen.com/i14831306/304ad288ae00bd64.png)
计算相似度的:
![](https://img.haomeiwen.com/i14831306/6614dc0c92f07441.png)
嵌入矩阵:通过E与one-hot向量相乘,得到最终的300维的表示。
![](https://img.haomeiwen.com/i14831306/e30191379662e79a.png)
学习词嵌入:
![](https://img.haomeiwen.com/i14831306/1b0b978b27f54939.png)
word2Vec:
![](https://img.haomeiwen.com/i14831306/dcd149c840d50435.png)
负采样:
![](https://img.haomeiwen.com/i14831306/630aeacfe79de883.png)
![](https://img.haomeiwen.com/i14831306/b8e731a5cc77f8b1.png)
Glove词向量:最小化这个函数
![](https://img.haomeiwen.com/i14831306/f9657b998515c3d9.png)
情感分类:将embedding加起来,但是左下角的不使用,所以要使用RNN,来捕捉not good这种思想。
![](https://img.haomeiwen.com/i14831306/7267287234bfe1a3.png)
![](https://img.haomeiwen.com/i14831306/44f7141d498a3879.png)
词嵌入除偏:
![](https://img.haomeiwen.com/i14831306/e496587a8174baaa.png)
![](https://img.haomeiwen.com/i14831306/543a7564afc9bc38.png)
第三周:序列模型和注意力机制
序列模型作机器翻译和图像描述
![](https://img.haomeiwen.com/i14831306/ac63bdd2bdea9a9f.png)
![](https://img.haomeiwen.com/i14831306/2cc0f49daa2f6a83.png)
机器翻译可以看作是条件语言模型:找到可能性最大的序列
![](https://img.haomeiwen.com/i14831306/9e1650f995c8b6b6.png)
![](https://img.haomeiwen.com/i14831306/7bcae692e5ab0f1d.png)
贪心算法并不适用
![](https://img.haomeiwen.com/i14831306/2cdb96225c737bfc.png)
定向搜索:考虑多种结果
![](https://img.haomeiwen.com/i14831306/190c54a9d33be61a.png)
![](https://img.haomeiwen.com/i14831306/0ed3cdce756b71b1.png)
改进定向搜索:长度归一化,上面的损失函数倾向于短句子
![](https://img.haomeiwen.com/i14831306/a446546ed2395ac2.png)
![](https://img.haomeiwen.com/i14831306/30142dcf87e77712.png)
束搜索的误差分析:判断是RNN模型出错了,还是束搜索出错了
![](https://img.haomeiwen.com/i14831306/d5c37ebca94e8edc.png)
![](https://img.haomeiwen.com/i14831306/555b6f8ae16beb87.png)
![](https://img.haomeiwen.com/i14831306/ee0935f9f9bf090a.png)
Bleu score(bilingual evaluation understudy 双语评估houbu):用来衡量一个语言翻译结果的准确性,因为通常有多种结果
![](https://img.haomeiwen.com/i14831306/0e5e7a3cbf6ed846.png)
![](https://img.haomeiwen.com/i14831306/5979feac746d5717.png)
为了惩罚太短的翻译结果:BP
![](https://img.haomeiwen.com/i14831306/6063fb4d1044aa34.png)
Attention模型直观理解:That's it,Attention!
![](https://img.haomeiwen.com/i14831306/08a51a22dd04a6ff.png)
![](https://img.haomeiwen.com/i14831306/a348ee20c37b8107.png)
注意力模型:
![](https://img.haomeiwen.com/i14831306/0b81cc77f21ce6ad.png)
![](https://img.haomeiwen.com/i14831306/9ed0b0f7b2aa2de3.png)
语音识别
![](https://img.haomeiwen.com/i14831306/4f4982ce96c86908.png)
![](https://img.haomeiwen.com/i14831306/f17100cca259d2b0.png)
触发字检测:
![](https://img.haomeiwen.com/i14831306/e2a4b03a50a10726.png)
![](https://img.haomeiwen.com/i14831306/78e060b04f69fe94.png)
At last 这是deep learning的开始,不是结束。接下来我将完成课程配套的quiz以及编程练习并将代码放在我的GitHub主页上~
网友评论