NLU+_7

作者: yannanoo | 来源:发表于2019-02-17 01:56 被阅读0次

    LECTURE7

    what Structural Alignment Biases
    why Attention is not alignment
    what Attention Scores
    how Conditioning Attention on Past Decisions


    P9 how to read the picture?
    attention is not alignment
    P2 formulate
    P13 papers ,read it -incorporating structural alignment biases


    Structural Alignment Biases

    The attentional model, as described above, providesa powerful and elegant model of translation in whichalignments between source and target words arelearned through the implicit conditioning context af-forded by the attention mechanism. Despite its ele-gance, the attentional model omits several key com-ponents of a traditional alignment models such asthe IBM models (Brown et al., 1993) and Vogel’shidden Markov Model (Vogel et al., 1996) as imple-mented in the GIZA++ toolkit (Och and Ney, 2003). Combining the strengths of this highly successfulbody of research into a neural model of machinetranslation holds potential to further improve mod-elling accuracy of neural techniques.

    image.png

    the blue one is the alignment result from IBM model and green one is the result from attention. They are not totally the same-->Attention is not alignment.

    how can NMT model translate text, even if attention is off?

    3 ways to obtain attention scores


    image.png

    attention的通用定义如下:

    给定一组向量集合values,以及一个向量query,attention机制是一种根据该query计算values的加权求和的机制。
    attention的重点就是这个集合values中的每个value的“权值”的计算方法。
    有时候也把这种attention的机制叫做query的输出关注了(或者说叫考虑到了)原文的不同部分。(Query attends to the values)

    举例:刚才seq2seq中,哪个是query,哪个是values?
    each decoder hidden state attends to the encoder hidden states (decoder的第t步的hidden state----st是query,encoder的hidden states是values)
    从定义来看Attention的感性认识:

    The weighted sum is a selective summary of the information contained in the values, where the query determines which values to focus on.
    换句话说,attention机制就是一种根据某些规则或者某些额外信息(query)从向量表达集合(values)中抽取特定的向量进行加权组合(attention)的方法。简单来讲,只要我们从部分向量里面搞了加权求和,那就算用了attention。

    https://blog.csdn.net/hahajinbu/article/details/81940355

    相关文章

      网友评论

          本文标题:NLU+_7

          本文链接:https://www.haomeiwen.com/subject/zfyieqtx.html