美文网首页
tf.keras.layers.AdditiveAttentio

tf.keras.layers.AdditiveAttentio

作者: 又双叒叕苟了一天 | 来源:发表于2021-01-02 13:14 被阅读0次

    加法注意力层, 又名Bahdanau-style attention

    tf.keras.layers.AdditiveAttention(
        use_scale=True, **kwargs
    )
    

    query' shape: [batch_size, Tq, dim], value's shape: [batch_size, Tv, dim], key's shape: [batch_size, Tv, dim], 计算的步骤如下:

    1. 把query和value的shape分别转换成[batch_size, Tq, 1, dim][batch_size, 1, Tv, dim]
    2. 计算注意力分数[batch_size, Tq, Tv]: scores = tf.reduce_sum(tf.tanh(query + value), axis=-1)
    3. 进行softmax: distribution = tf.nn.softmax(scores)
    4. 对value加权求和: tf.matmul(distribution, value), 得到shape为batch_size, Tq, dim]的输出
    参数
    use_scale 如果为 True, 将会创建一个标量的变量对注意力分数进行缩放.
    causal Boolean. 可以设置为 True 用于解码器的自注意力. 它会添加一个mask, 使位置i 看不到未来的信息.
    dropout 0到1之间的浮点数. 对注意力分数的dropout

    调用参数:

    inputs:

    • query: [batch_size, Tq, dim]
    • value: [batch_size, Tv, dim]
    • key: [batch_size, Tv, dim], 如果没有给定, 则默认key=value

    mask:

    • query_mask: [batch_size, Tq], 如果给定, mask==False的位置输出为0.
    • value_mask: [batch_size, Tv], 如果给定, mask==False的位置不会对输出产生贡献.

    training: 是否启用dropout

    示例:

    # Variable-length int sequences.
    query_input = tf.keras.Input(shape=(None,), dtype='int32')
    value_input = tf.keras.Input(shape=(None,), dtype='int32')
    
    # Embedding lookup.
    token_embedding = tf.keras.layers.Embedding(max_tokens, dimension)
    # Query embeddings of shape [batch_size, Tq, dimension].
    query_embeddings = token_embedding(query_input)
    # Value embeddings of shape [batch_size, Tv, dimension].
    value_embeddings = token_embedding(value_input)
    
    # CNN layer.
    cnn_layer = tf.keras.layers.Conv1D(
        filters=100,
        kernel_size=4,
        # Use 'same' padding so outputs have the same shape as inputs.
        padding='same')
    # Query encoding of shape [batch_size, Tq, filters].
    query_seq_encoding = cnn_layer(query_embeddings)
    # Value encoding of shape [batch_size, Tv, filters].
    value_seq_encoding = cnn_layer(value_embeddings)
    
    # Query-value attention of shape [batch_size, Tq, filters].
    query_value_attention_seq = tf.keras.layers.AdditiveAttention()(
        [query_seq_encoding, value_seq_encoding])
    
    # Reduce over the sequence axis to produce encodings of shape
    # [batch_size, filters].
    query_encoding = tf.keras.layers.GlobalAveragePooling1D()(
        query_seq_encoding)
    query_value_attention = tf.keras.layers.GlobalAveragePooling1D()(
        query_value_attention_seq)
    
    # Concatenate query and document encodings to produce a DNN input layer.
    input_layer = tf.keras.layers.Concatenate()(
        [query_encoding, query_value_attention])
    
    # Add DNN layers, and create Model.
    # ...
    

    相关文章

      网友评论

          本文标题:tf.keras.layers.AdditiveAttentio

          本文链接:https://www.haomeiwen.com/subject/kmkjoktx.html