《Exploring the Syntactic Abiliti

作者: best___me | 来源:发表于2017-12-07 17:57 被阅读0次

《Exploring the Syntactic Abiliti
[译] 探索 Swift 4 中新的 String API
Syntactic and Lexical Grammars
exploring Wuhan
GO EXPLORING
2018-04-12 语法和语义
A Syntactic Study of Idioms Psyc
iOS Block 应用浅析
Blocks Programming Topics
Android Gradle编程实战---IT星讲堂

论文链接：http://www.aclweb.org/anthology/K/K17/K17-1003.pdf

RNN对于语法处理的能力在很多常见情况都表现很好，但是在复杂句子上不行，本文测试了这些错误是由于体系结构的固有局限性还是语料库中agreement dependencies导致的indirect supervision。

文章的发现：

1. Multi-task training led to significantly lower error rates, in particular on complex sentences, suggesting that RNNs have the ability to evolve more sophisticated syntactic representations than shown before.多任务训练可以达到更低的错误率，尤其是复杂句子，表示RNN有能力去处理更复杂的句法表示

2. 容易获得的agreement训练数据可以提高其他句法任务的性能，特别是当这些任务只有有限数量的训练数据时。

3. The multi-task paradigm can also be leveraged to inject grammatical knowledge into language model. 将语法知识加入到语言模型中，可以改进multi-task paradigm

Introduction：

过去的RNN被用于自然语言处理的应用，但是没有加入显式的语言表示，例如依存关系解析或逻辑形式（dependency parsers or logical forms），方法需要获取语言泛化的特征。一种方法是基于behavioral psychology(行为心理学)，网络是在挑选后的案例上测试。

过去有的方法是测试一个训练后的RNN是如果捕获句子结构的，使用agreement prediction task。一个英文动词经常依赖与它的主语。辨别一个动词的主语需要对句子结构敏感。

RNNs的limitations是由于architecture吗？可以通过stronger supervision来缓解吗？

本文发现多任务学习，LM或CCG supertagging，可以改进RNN在agreement prediction任务上的表现。联合训练agreement prediction任务不能改进所有语言模型的perplexity，但是可以让模型更加syntax-aware: grammatically appropriate verb forms had higher probability than grammatically inappropriate ones.

Background and related work：

1. Agreement prediction

在英语中现在时态第三人称动词与它们的主语一致，单数主语需要单数动词，复数主语需要复数动词。复杂句子往往有多个主语对应不同的动词

All we need is a tagger that can identify such verbs and determine whether they are plural or singular.

也可以用启发式算法heuristics，确定距离最近的名词，但是有可能出错

agreement task做什么？

给learner一个动词，然后它预测这个动词是使用单数还是复数形式。

2. CCG supertagging

Combinatory categorial grammar(CCG) is a syntactic formalism that relies on a large inventory of lexical categories. 例如：intransitive verbs不及物动词(smile)，transitive verbs及物动词(build)，raising verbs提升动词(seem)。

句子中的每个词都与一个tag关联

3. Language modeling

语言模型是要学习一个分布，给定句子中前j-1个词，去预测第j个词的概率

概率分布

最小化所有句子的平均负对数似然

损失函数

4. Multitask learning

By training a network on a simple task for which large quantities of data are available, we can encourage it to evolve representations that would help its performance on the primary task.

本文的方法：

标准单层LSTM，第一层是embedding layer，第二层是D维LSTM，第三层依赖于具体任务。

For agreement, the output layers consisted of a linear layer with a one-dimentsional output and a sigmoid activation; for language modeling, a linear layer with an N-dimensional output, where N is the size of the lexicon, and a softmax activation; and for supertagging, a linear layer with an S-dimensional output, where S is the number of possible tags, followed by a softmax activation.

language modeling的损失如公式1所示，agreement的损失为二元分类器的平均交叉熵：

the loss for agreement

q是动词的估计分布，S是句子，num(s)是s中正确的动词数，s:vd the sentence up to the verb

CCG supertagging的损失：

CCG的损失

tag(wj)是在句子s中wj词的正确tag，s:wj is the sentence s up to and including wj.

Joint training

global loss

pre-training：

在一个任务上训练网络，然后将最后的权重作为第二个任务的初始权重。

training：

使用AdaGrad优化器，batch size 128

实验结果：

多任务学习，使用supertagging可以提高agreement任务的精确度，不管是使用pre-training还是joint training，与单任务训练相比有提高。但是反过来agreement的不能提高supertagging的精确度。

数据量比例的影响：

使用RNN生成的句法表示即使是使用多任务学习，也没有提高，但是stronger inductive biases可能会

网友评论

本文标题：《Exploring the Syntactic Abiliti

本文链接：https://www.haomeiwen.com/subject/hcvcixtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

《Exploring the Syntactic Abiliti

文章的发现：

agreement task做什么？

本文的方法：

Joint training

pre-training：

相关文章

《Exploring the Syntactic Abiliti

[译] 探索 Swift 4 中新的 String API

Syntactic and Lexical Grammars

exploring Wuhan

GO EXPLORING

2018-04-12 语法和语义

A Syntactic Study of Idioms Psyc

iOS Block 应用浅析

Blocks Programming Topics

Android Gradle编程实战---IT星讲堂

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读