美文网首页nlp
Neural Dependency Parsing

Neural Dependency Parsing

作者: 一梦换须臾_ | 来源:发表于2018-10-11 11:09 被阅读11次

写在前面

  1. 关于parsing,目前有两种主流的方法,一种是Constituency Parsing,另一种是Dependency Parsing,在本文中我们主要讨论Dependency Parsing
  2. 知识点参见: 阿衡学姐的笔记 Dependency Parsing and Treebank
  3. Stanford cs224N讲义: Dependency Parsing
  4. Basic论文: A Fast and Accurate Dependency Parser using Neural Networks,基于greedy transition-based parsing,结合了word_embeddingneural networks
  5. Advanced论文
    Globally Normalized Transition-Based Neural Networks
    Universal Dependencies: A cross-linguistic typology
    Incrementality in Deterministic Dependency Parsing
    主要改进点有: 使用更深层次的神经网络,处理non-projectivity问题,使用graph-based parsing
  6. 基础项目: Neural Dependency Parsing

Neural transition-based Dependency Parsing

Conventional transition-based Dependency Parsing

  1. Structure: Stack s, Buffer b and A(record dependencies)
  2. At each move, use a discriminative classifier(such as SVM) to decide what kind of operation next time
transition_based parsing.png

Feature Extract

  1. In conventional ways, we represent word|tag|label as one-hot vectors, which is sparse and costs much time
  2. In Neural transition-based dependency parsing model, we represent word|tag|label as pre-trained word_embeddings
feature_extracted.png

Choices of featurs

The choice of Sw, St, Sl
Following (Zhang and Nivre, 2011), we pick a rich set of elements for our final parser. In de- tail, Sw contains nw = 18 elements: (1) The top 3 words on the stack and buffer: s1, s2, s3, b1, b2, b3; (2) The first and second leftmost / rightmost children of the top two words on the stack: lc1(si), rc1(si), lc2(si), rc2(si), i = 1, 2. (3) The leftmost of leftmost / rightmost of right- most children of the top two words on the stack: lc1(lc1(si)), rc1(rc1(si)), i = 1, 2.
We use the corresponding POS tags for St (nt = 18), and the corresponding arc labels of words excluding those 6 words on the stack/buffer for Sl (nl = 12). A good advantage of our parser is that we can add a rich set of elements cheaply, instead of hand-crafting many more indicator fea- tures.

Neural Networks

after word_embedding,we feed the input to hidden-layer, which has a novel activation function(f(x) = x^3),then use a softmax-layer to predict next transition


neural_model.png

相关文章

网友评论

    本文标题:Neural Dependency Parsing

    本文链接:https://www.haomeiwen.com/subject/xetbaftx.html