关于Attention

作者: zhnidj | 来源:发表于2018-06-23 12:43 被阅读0次

attention pytorch实现学习
关于Attention
关于self-attention
关于Attention的总结
注意力的英语表达
Attention
Paying Attention and Getting Att
paper创新点毫无头绪？要不试试这个百搭的Attention？
Attention Is All You Need模型细节解析
法语学习：一些非常有用的法语句子

其实卷积神经网络自带Attention的功能，比方说在分类任务中，高层的feature map所激活的pixel也恰好集中在与分类任务相关的区域，也就是salience map

在此推荐一下MIT的phd周博磊大神的文章《Learning Deep Features for Discriminative Localization》

这里的Wi如果是正的，说明这个kernel学习的模式对于判断这个类别有正向作用，比如判断狗时学狗头的kernel；Wi如果是负的，说明这个kernel学习的模式对于判断这个类别有负向作用，比如判断out door场景时学床或者窗子的kernel；

attention的几种形式：

The visual attention mechanism may have at least the following basic components [Tsotsos, et. al. 1995]:

(1) the selection of a region of interest in the visual field;

(2) the selection of feature dimensions and values of interest;

(3) the control of information flow through the network of neurons that constitutes the visual system; and

(4) the shifting from one selected region to the next in time .

Attention model 可以分成 soft attention 和 hard attention 两种，前者可以通过反向传播训练，后者是在分布中以某种采样策略选取部分分量，可以通过强化学习等方法优化。

这个加权可以作用在空间尺度上，给不同空间区域加权；推荐文章《Residual Attention Network for Image Classification》CVPR17

也可以作用在channel尺度上，给不同通道特征加权；推荐文章《Squeeze-and-Excitation Networks》ILSVRC 2017 image classification winner; CVPR 2018 Oral

对于序列也可以作用在时间维度上，对于文本、音频、视频数据。

Attention在seq2seq里面的应用：Attention is all your need讲解博客 Google大作，RNN 要逐步递归才能获得全局信息，因此一般要双向 RNN 才比较好；CNN 事实上只能获取局部信息，是通过层叠来增大感受野；Attention 的思路最为粗暴，它一步到位获取了全局信息，感觉nonlocal就是视频版本的这篇文章有没有！

推荐一下facebook在CVPR2018的大作《Non-local Neural Networks》，感觉是目前看到的视频领域最精巧的attention，未来视频paper估计都要和它比了，打视频比赛估计也会是3D卷积+光流+音频+nonlocal+ensemble了吧。。

J.K. Tsotsos, et.al., Modeling visual attention via selective tuning, Arti. Intell., 1995, 78:507-545.