《Deep Keyphrase Generation》阅读笔记

作者: best___me | 来源:发表于2017-10-24 20:29 被阅读0次

Keyphrase Generation
《Deep Keyphrase Generation》阅读笔记
《Paraphrase Generation with Deep
DEEP GRAPH INFOMAX 阅读笔记
25组-Deep Residual Learning for I
论文阅读笔记 Wide & Deep Learning for
论文阅读笔记 Wide & Deep Learning
[NLP论文笔记] Deep contextualized wo
深度学习经典论文Top100 系列之优化-Dropout(1)
2020-11-17每日美文阅读和英语学习(第232天)

论文来源：ACL2017 链接：http://www.aclweb.org/anthology/P/P17/P17-1054.pdf

keyphrase：高度总结的，可以用于理解的文本信息

过去的方法：将需要总结的内容转换成multiple text chunks，然后选择最有意义的。

这些方法不能找到没出现在文本中的keyphrase，也不能捕捉真实的情感意义。

本文的方法： a generative model for keyphrase prediction with an encoder-decoder framework.

过去的方法：1，准备keyphrase candidate，研究人员使用n-grams或者名词短语with certain part-of-speech patterns for identifying potential candiates。2，在文档中对这些candidates排序。

过去方法的缺点：只能提取source text中的，不能预测有一点顺序不同的keyphrase。

本文中：与源文并不那么匹配的是absent keyphrases ，完全匹配源文的是present keyphrases。本文通过模拟人类总结keyphrase的方法，先理解全文，然后生成。

使用RNN压缩semantic information，本文加入了copying mechanism来使模型可以基于位置信息找到重要部分。

！！！文本的contribution：1. RNN+copying 2.recall up to 20% of absent keyphrases，3实验

Encoder-Decoder Model:

最初是用在翻译问题上，可以对不同长度的序列建模，同时也是端对端的

本文的方法:

将问题转换为一个序列与该序列的keyphrase的对应的问题

问题定义

encoder RNN：

变长序列： x = (x1, x2, ... xT) 到隐状态：h = (h1, h2,... hT)，通过如下迭代：

f是一个非线性函数

上下文向量c，q是一个非线性函数：

上下文 context vector c

decoder RNN：

解压 context vector 并且生成变长的序列： y = (y1, y2, ..yT') word by word，through a conditional language model

st是decoder RNN时间t的隐状态。非线性函数g是一个softmax分类器，它的输出是vocabulary中所有词的概率。yt是时间t的预测词，是取了g(...)的最大概率。

encoder和decoder网络是联合训练的，最大化 the conditional probability of the target sequence, given a source sequence.

训练之后，使用beam search来生成phrase 和 a max heap is maintained to get the predicted word sequences with the highest probabilities.

Encoder和Decoder的细节：

encoder使用GRU，过去的研究表明GRU的performance比简单的RNN和LSTM好，所以非线性方程f都用GRU替换！！！（不懂啊！！！为啥f用GRU替换），decoder使用前向GRU。除此之外，attention机制也被使用啦啦啦啦~~~~上下文向量c用带有权重的隐状态计算得到：

Copying mechanism：

为了确保学习质量和减少vocabulary的大小，RNN考虑了频繁的词，而大部分长尾词被忽略了。因此RNN不能recall任何不在vocabulary的词。实际上，重要的短语可以通过位置和句法信息。copying mechanism可以让RNN能预测out-of-vocabulary的词，通过从source text中挑选合适的词。

因此the probability of predicting each new word yt consists of two parts. 第一部分是generating the term的概率（公式3），第二部分是 the probability of copying it from the source text.