序列建模（四）：CV_Attention

作者: emm_simon | 来源:发表于2020-01-22 14:26 被阅读0次

序列建模（四）：CV_Attention
2016-2-28 今日收集
UML建模（四）业务建模之业务序列图
因素分解及 Prophet 工具使用
学习笔记TF021:预测编码、字符级语言建模、ArXiv摘要
8.3专家建模法
时间序列建模分析
时间序列-建模步骤
时间序列|建模步骤
tensorflow代码全解析 -3- seq2seq 自动生成

[Show and Tell 中文博客参考link]
[Show, Attend and Tell 中文博客参考link]

主要参考14、15年的两篇CV论文：

发表年月	论文链接
1411.4555	CV_Image_Caption : Show and Tell
1502.03044	CV_Attention_Image_Caption : Show, Attend and Tell

-1- CV_Image_Caption : Show and Tell

-1.1- 引入Encoder Decoder结构后的CV模型结构

Show and Tell模型首次将编码器-解码器(Encoder-Decoder)结构引入了神经图像标注(NIC ：Neural Image Captioning)领域，提出了一种端到端(End-to-End)的模型解决图像标注问题。NIC网络由一个Encoder、一个Decoder构成，其中：
Encoder：
Encoder是一个卷积神经网络(CNN)结构，由多层的深度CNN构成。
Decoder：
Decoder是一个长短期记忆网络(LSTM)结构，由一个LSTM_cell构成。

Show and Tell模型结构.png

Show and Tell模型结构.png
* 在文章中，作者提出使用在图像分类任务（Image Classification Task）中预训练好的Inception v2作为编码器，将其最后一个隐藏层提取到的特征作为解码器隐藏层的初始状态。
* 在官方给出的源码neuraltalk中，作者使用了预训练好的VGG16作为了编码器，将Layer FC-4096提取到的特征作为了LSTM隐藏层的初始状态（详见neuraltalk/py_caffe_feat_extract.py line160）。
* 在官方给出的源码neuraltalk2中，同样使用了VGG 16作为编码器提取图像特征（详见neuraltalk2/train.lua line27）。
* 在zsdonghao对该方法的TensorFlow实现中，使用了Inception v3作为编码器（详见zsdonghao/Image-Captioning/inception_v3(for TF 0.10).py）。

-1.2- Show and Tell模型中待训练的参数矩阵

Encoder中的参数矩阵：
Deep_CNN_layers中的参数矩阵
Decoder中的参数矩阵：
LSTM_cell中的参数矩阵

-2- CV_Attention_Image_Caption : Show, Attend and Tell

-2.1- 引入Attention机制后的模型结构

Show and Tell模型首次将Attention机制引入了神经图像标注(NIC ：Neural Image Captioning)领域。Show, Attend and Tell模型由一个Encoder、一个Decoder和一个Attention Model构成，其中：
Encoder：
Encoder是一个卷积神经网络(CNN)结构，由多层的深度CNN构成。
Decoder：
Decoder是一个长短期记忆网络(LSTM)结构，由一个LSTM_cell构成。
Attention Model：
Attention Model是一个多层感知机(Multilayer Perceptron)结构，由多层全联接神经网络构成。