2019-07-04

作者: 人工智能人话翻译官 | 来源:发表于2019-07-04 22:18 被阅读0次

Each convolution kernel

$W \in R^{2d X kd}$ 每个卷积核的权重形状, 2d 是channel， k是kernel size， d是emb dim。
$b_w \in R^{2d}$ bias的形状。
$X \in R^{k X d}$ conv_input的输入形状为k X d
最后Y的输出形状为 $Y \in R^{2d}$

GLU (gated linear units)

每个卷积层中都包括一个GLU，来实现类似LSTM的长短记忆。
公式为 $v([A B] = A \bigotimes \sigma(B))$

functional.glu(input, dim=-1):

dim [int] – dimension on which to split the input. 决定切割的方向

where input is split in half along dim to form a and b, 
σ
σ is the sigmoid function and 
⊗
⊗ is the element-wise product between matrices.

image.png

需要注意的是GLU的输出形状，处理后会变成处理前的一半。

#conved = [batch size, 2*hid dim, src sent len]
#pass through GLU activation function
conved = F.glu(conved, dim = 1) 这个就相当于是竖向切割
#conved = [batch size, hid dim, src sent len]

处理完的结果就相当于之前我们的要的隐藏状态h。

残差网络

最后把卷积得的的隐藏状态和最开始embedded的结果连接到一起，像残差网络一样加到一起再处理一次！这样做的好处，其实和残差网络的初衷是一致的，因为我们用了太多的卷积层，这样能够把初始数据和最终的结果更好的关联起来。

网友评论

工作生活

本文标题：2019-07-04

本文链接：https://www.haomeiwen.com/subject/ckllhctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

2019-07-04

Each convolution kernel

GLU (gated linear units)

残差网络

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

工作生活