tensorflow2.0下对多层双向循环神经网络api的输出测

作者: 又双叒叕苟了一天 | 来源:发表于2019-10-11 16:24 被阅读0次

问题背景：

1、将单词的index序列输入embedding层编码成嵌入表示

2、将单词的嵌入序列输入由RNN构成的编码器进行编码

那么RNN编码器的输出的格式是怎么样的呢？在网上我们可以看到很多序列模型用到了双向的RNN，并堆叠了多层构成了多层的双向RNN。但是我们有时候也是需要中间层的状态的，通常的做法是需要另外构造一个model进行输出，这显然是不自由的。

所以这次我们自己直接构造一个多层双向的RNN来检测他的输出结果到底是什么。这次测试针对的版本是tensorflow2.0，由于2.0版本的eager计算方式和自动图更新，所以下面都采用面向对象来编程。

1 构造嵌入层

import tensorflow.keras as keras

class Embedding(keras.layers.Layer):

    def __init__(self, input_size,
                 output_size,
                 weights=None):
        super(Embedding, self).__init__()

        if weights is not None:
            self.embedding = keras.layers.Embedding(input_size, output_size, weights=weights, mask_zero=True)
        else:
            self.embedding = keras.layers.Embedding(input_size, output_size, mask_zero=True)

    def __call__(self, input):  # [batch, len]

        return self.embedding(input)  # [batch, len, output_size]

这个嵌入层类主要封装了keras.layers.Embedding(input_size, output_size, weights=weights, mask_zero=True)，这个api涉及的几个参数：

第一个参数为词汇表的维度
第二个参数为词嵌入维度
weights为初始化的权值，如果有预训练的词嵌入，可以通过这个传入
mask_zero这个参数实现了对index=0的单词的mask，我们通常把pad的符号设置为词汇表中的index=0，于是它产生一个mask并向后传递，在RNN中防止对句子中多余的pad符号进行解码。在tensorflow1.x中是通过在tf.nn.dynamic_rnn()这个api中传入一个encoder_len实现的，在pytorch中torch.nn.utils.rnn.pack_padded_sequence也起到了相同的作用。这个mask只会在RNN编码时起作用，并不会把pad的词嵌入变成全0，原来是什么就是什么。

# 嵌入层测试
weights = [np.array([[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]], dtype=np.float64)]
embedding = Embedding(3, 5, weights)  # (num_vocab, embedding_size)

word_id = tf.convert_to_tensor([[1, 2, 0], [1, 0, 0]], dtype=tf.int64)
word_embed = embedding(word_id)  # [batch, seq, embedding_size]

print(word_embed)
>>tf.Tensor(
[[[2. 3. 4. 5. 6.]
  [3. 4. 5. 6. 7.]
  [1. 2. 3. 4. 5.]]

 [[2. 3. 4. 5. 6.]
  [1. 2. 3. 4. 5.]
  [1. 2. 3. 4. 5.]]], shape=(2, 3, 5), dtype=float32)

2 构造编码器

import tensorflow.keras as keras

class Encoder(keras.layers.Layer):

    def __init__(self, rnn_type,  # rnn类型
                 input_size,
                 output_size,
                 num_layers,  # rnn层数
                 bidirectional=False):
        super(Encoder, self).__init__()
        assert rnn_type in ['GRU', 'LSTM']
        if bidirectional:
            assert output_size % 2 == 0

        if bidirectional:
            self.num_directions = 2
        else:
            self.num_directions = 1

        units = int(output_size / self.num_directions)

        if rnn_type == 'GRU':
            rnnCell = [getattr(keras.layers, 'GRUCell')(units) for _ in range(num_layers)]
        else:
            rnnCell = [getattr(keras.layers, 'LSTMCell')(units) for _ in range(num_layers)]

        self.rnn = keras.layers.RNN(rnnCell, input_shape=(None, None, input_size),
                                    return_sequences=True, return_state=True)
        self.rnn_type = rnn_type
        self.num_layers = num_layers

        if bidirectional:
            self.rnn = keras.layers.Bidirectional(self.rnn, merge_mode='concat')

        self.bidirectional = bidirectional


    def __call__(self, input):  # [batch, timesteps, input_dim]

        outputs = self.rnn(input)

        output = outputs[0]
        states = outputs[1:]

        print(outputs)  # 用于测试的输出
        print(len(outputs))  # 用于测试的输出
        print(len(states))  # 用于测试的输出

        return output, states

构造方法：

通过rnnCell = [getattr(keras.layers, 'GRUCell')(units) for _ in range(num_layers)]或rnnCell = [getattr(keras.layers, 'LSTMCell')(units) for _ in range(num_layers)]获得num_layers层RNN单元列表
通过rnn = keras.layers.RNN(rnnCell, input_shape=(None, None, input_size),
return_sequences=True, return_state=True)传入RNN单元列表构造一个多层的RNN，return_sequences=True代表输出每个时间步的输出，而不是最后一个时间步的输出，return_state=True代表返回RNN状态，False的话就不返回状态了。
通过rnn = keras.layers.Bidirectional(self.rnn, merge_mode='concat')加上双向，merge_mode='concat'代表通过拼接方式产生输出

3 测试输出

前面说了那么多废话，下面开始对输出进行测试，对这部分没兴趣的可以直接看第4部分的结论。

3.0 实验参数设置

# num_vocab = 3
# embedding_size = 5
# batch_size = 2
# encoder_len = 3
# num_units = 10
# num_layers = 2

3.1 单向多层GRU

对2层进行测试：

outputs = rnn(input)
>>[<tf.Tensor: id=553, shape=(2, 3, 10), dtype=float32, numpy=
array([[[-3.5420116e-02, -8.9026507e-05,  2.2907217e-01,  1.9754110e-01,
         -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615253e-02,
         -1.8428519e-01, -2.1019778e-01],
        [-7.6624170e-02,  3.7288409e-02,  3.4195143e-01,  3.2474262e-01,
         -7.6712951e-02, -3.0440533e-01,  1.9677658e-01,  1.2763622e-01,
         -2.7749074e-01, -3.2409826e-01],
        [-7.6624170e-02,  3.7288409e-02,  3.4195143e-01,  3.2474262e-01,
         -7.6712951e-02, -3.0440533e-01,  1.9677658e-01,  1.2763622e-01,
         -2.7749074e-01, -3.2409826e-01]],

       [[-3.5420127e-02, -8.9021691e-05,  2.2907217e-01,  1.9754107e-01,
         -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615216e-02,
         -1.8428519e-01, -2.1019775e-01],
        [-3.5420127e-02, -8.9021691e-05,  2.2907217e-01,  1.9754107e-01,
         -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615216e-02,
         -1.8428519e-01, -2.1019775e-01],
        [-3.5420127e-02, -8.9021691e-05,  2.2907217e-01,  1.9754107e-01,
         -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615216e-02,
         -1.8428519e-01, -2.1019775e-01]]], dtype=float32)>, <tf.Tensor: id=542, shape=(2, 10), dtype=float32, numpy=
array([[ 0.10095029, -0.998891  , -0.48548818, -0.00963031, -0.97031355,
        -0.12160255,  0.999949  , -0.10839747, -0.18006183, -0.17532544],
       [ 0.03464954, -0.9603172 , -0.53084654,  0.00194323, -0.8031896 ,
        -0.07652862,  0.9911491 , -0.06364062, -0.11014236, -0.14036107]],
      dtype=float32)>, <tf.Tensor: id=543, shape=(2, 10), dtype=float32, numpy=
array([[-7.6624170e-02,  3.7288409e-02,  3.4195143e-01,  3.2474262e-01,
        -7.6712951e-02, -3.0440533e-01,  1.9677658e-01,  1.2763622e-01,
        -2.7749074e-01, -3.2409826e-01],
       [-3.5420127e-02, -8.9021691e-05,  2.2907217e-01,  1.9754107e-01,
        -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615216e-02,
        -1.8428519e-01, -2.1019775e-01]], dtype=float32)>]

可以看到output包含三个部分：

output[0]：[batch_size, decoder_len, num_units]

output[1]：[batch_size, num_units]

output[2]：[batch_size, num_units]

观察数据可以看出output[2]是最后的h，代表output[2]为双层RNN的最后一层。查看output[0]的数据我们也能发现受到嵌入层mask的作用，pad部分的编码结果和句子结束时状态是一样的，只是向后复制了。

所以结论就是输出结果形式：

[output[0], output[1], output[2], ..., output[num_layers]]

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[N]是每一层的状态h[batch_size, num_units]

3.2 双向多层GRU

对2层进行测试：

outputs = rnn(input)
>>[<tf.Tensor: id=1096, shape=(2, 3, 10), dtype=float32, numpy=
array([[[-0.01417219,  0.13640611,  0.32041013,  0.00786568,
         -0.03442783,  0.46687838,  0.14251477, -0.0060271 ,
         -0.03813943, -0.4147334 ],
        [-0.0154626 ,  0.22333089,  0.49720186, -0.02729558,
         -0.13843244,  0.30179217,  0.10419664, -0.0332097 ,
         -0.06268977, -0.33545047],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ]],

       [[-0.01417219,  0.13640611,  0.32041013,  0.00786567,
         -0.03442784,  0.29038957,  0.09369997, -0.00535166,
         -0.02358363, -0.31432554],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ]]], dtype=float32)>, <tf.Tensor: id=632, shape=(2, 5), dtype=float32, numpy=
array([[-0.01106511,  0.97525597,  0.38123077,  0.15792789, -0.8506844 ],
       [-0.00910319,  0.8093642 ,  0.2359951 , -0.14750779, -0.56568766]],
      dtype=float32)>, <tf.Tensor: id=633, shape=(2, 5), dtype=float32, numpy=
array([[-0.0154626 ,  0.22333089,  0.49720186, -0.02729558, -0.13843244],
       [-0.01417219,  0.13640611,  0.32041013,  0.00786567, -0.03442784]],
      dtype=float32)>, <tf.Tensor: id=1081, shape=(2, 5), dtype=float32, numpy=
array([[0.3142835 , 0.98540443, 0.26638144, 0.00319364, 0.98887223],
       [0.36952233, 0.9663322 , 0.17328681, 0.00246616, 0.9730079 ]],
      dtype=float32)>, <tf.Tensor: id=1082, shape=(2, 5), dtype=float32, numpy=
array([[ 0.46687838,  0.14251477, -0.0060271 , -0.03813943, -0.4147334 ],
       [ 0.29038957,  0.09369997, -0.00535166, -0.02358363, -0.31432554]],
      dtype=float32)>]

可以看到output包含五个部分：

output[0]：[batch_size, decoder_len, num_units]

output[1]：[batch_size, num_units/2]

output[2]：[batch_size, num_units/2]

output[3]：[batch_size, num_units/2]

output[4]：[batch_size, num_units/2]

观察数据可以看出output[2]是最后的h，代表output[2]为双层RNN的最后一层的前向RNN。查看output[0]的数据我们也能发现受到嵌入层mask的作用，pad部分的编码结果为0（和单向略有不同）。

所以结论就是输出结果形式：

[output[0], output[1], output[2], ..., output[num_layers*2]]

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[1]：是第0层的正向h[batch_size, num_units/2]

output[2]：是第1层的正向h[batch_size, num_units/2]

output[3]：是第0层的反向h[batch_size, num_units/2]

output[4]：是第1层的反向h[batch_size, num_units/2]

依次类推：

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[1: 1+num_layers]是每一层的正向h[batch_size, num_units/2]

output[1+num_layers:]是每一层的反向h[batch_size, num_units/2]

3.3 单向多层LSTM

对2层进行测试：

outputs = rnn(input)
>>[<tf.Tensor: id=413, shape=(2, 3, 10), dtype=float32, numpy=
array([[[ 0.03599537, -0.01473989,  0.05308587, -0.00895863,
          0.01214957, -0.03720263,  0.02418177, -0.01348425,
         -0.01298695, -0.03001863],
        [ 0.07842067, -0.03227948,  0.09026823, -0.02830549,
          0.01443951, -0.07027332,  0.05110155, -0.02023602,
         -0.01933629, -0.05507426],
        [ 0.07842067, -0.03227948,  0.09026823, -0.02830549,
          0.01443951, -0.07027332,  0.05110155, -0.02023602,
         -0.01933629, -0.05507426]],

       [[ 0.03599537, -0.01473989,  0.05308587, -0.00895863,
          0.01214957, -0.03720263,  0.02418176, -0.01348425,
         -0.01298695, -0.03001863],
        [ 0.03599537, -0.01473989,  0.05308587, -0.00895863,
          0.01214957, -0.03720263,  0.02418176, -0.01348425,
         -0.01298695, -0.03001863],
        [ 0.03599537, -0.01473989,  0.05308587, -0.00895863,
          0.01214957, -0.03720263,  0.02418176, -0.01348425,
         -0.01298695, -0.03001863]]], dtype=float32)>, [<tf.Tensor: id=400, shape=(2, 10), dtype=float32, numpy=
array([[ 0.03796372, -0.00646253, -0.10610048,  0.2621497 ,  0.00817543,
         0.08675741,  0.03996095,  0.16117425,  0.65429616, -0.07473923],
       [ 0.03174995, -0.0089063 , -0.07151143,  0.1907991 ,  0.01177687,
         0.04312354,  0.02712633,  0.19289187,  0.51734495, -0.09216765]],
      dtype=float32)>, <tf.Tensor: id=401, shape=(2, 10), dtype=float32, numpy=
array([[ 0.44051132, -0.2818818 , -0.11988518,  1.2482902 ,  0.17308153,
         0.69406235,  0.06025018,  1.0685071 ,  0.797681  , -0.1052426 ],
       [ 0.22792174, -0.23269363, -0.0844808 ,  0.6085427 ,  0.16032045,
         0.3221852 ,  0.04220397,  0.8066951 ,  0.5936996 , -0.12931918]],
      dtype=float32)>], [<tf.Tensor: id=402, shape=(2, 10), dtype=float32, numpy=
array([[ 0.07842067, -0.03227948,  0.09026823, -0.02830549,  0.01443951,
        -0.07027332,  0.05110155, -0.02023602, -0.01933629, -0.05507426],
       [ 0.03599537, -0.01473989,  0.05308587, -0.00895863,  0.01214957,
        -0.03720263,  0.02418176, -0.01348425, -0.01298695, -0.03001863]],
      dtype=float32)>, <tf.Tensor: id=403, shape=(2, 10), dtype=float32, numpy=
array([[ 0.15394947, -0.06263469,  0.19750515, -0.05156851,  0.02507691,
        -0.14487514,  0.0979518 , -0.03745949, -0.04038396, -0.11667444],
       [ 0.07117884, -0.02876606,  0.11274612, -0.01666791,  0.02163199,
        -0.07605074,  0.0462449 , -0.0253415 , -0.02669653, -0.0623024 ]],
      dtype=float32)>]]

可以看到output包含三个部分：

output[0]：[batch_size, decoder_len, num_units]

output[1]：[[batch_size, num_units], [batch_size, num_units]]

output[2]：[[batch_size, num_units], [batch_size, num_units]]

观察数据可以看出output[2][1]是最后的h，代表output[2][1]为双层RNN的最后一层的h。

所以结论就是输出结果形式：

[output[0], output[1], output[2], ..., output[num_layers]]

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[N]是每一层的状态[h, c] [[batch_size, num_units], [batch_size, num_units]]

3.4 双向多层LSTM

对2层进行测试：

outputs = rnn(input)
>>[<tf.Tensor: id=816, shape=(2, 3, 10), dtype=float32, numpy=
array([[[-0.06421194, -0.00754393, -0.04505453,  0.05208206,
         -0.03166301, -0.0243494 , -0.00789784,  0.10367834,
          0.09167746,  0.01394088],
        [-0.1210794 , -0.01336129, -0.09259984,  0.08671384,
         -0.06314958, -0.00972542,  0.00197651,  0.04819337,
          0.05299319, -0.00179022],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ]],

       [[-0.06421195, -0.00754394, -0.04505453,  0.05208206,
         -0.031663  , -0.00825483,  0.00164982,  0.0411781 ,
          0.04471161, -0.00124086],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ]]], dtype=float32)>, [<tf.Tensor: id=506, shape=(2, 5), dtype=float32, numpy=
array([[ 8.3107513e-01, -6.2514983e-02, -2.2869313e-01,  2.0354016e-02,
        -2.1946893e-04],
       [ 6.7014122e-01, -6.0981486e-02, -1.2038765e-01,  1.5553602e-02,
        -9.7971398e-04]], dtype=float32)>, <tf.Tensor: id=507, shape=(2, 5), dtype=float32, numpy=
array([[ 1.234918  , -0.3281948 , -0.28206116,  0.06127462, -0.39995325],
       [ 0.8689103 , -0.22541635, -0.15223289,  0.04101423, -0.34894544]],
      dtype=float32)>], [<tf.Tensor: id=508, shape=(2, 5), dtype=float32, numpy=
array([[-0.1210794 , -0.01336129, -0.09259984,  0.08671384, -0.06314958],
       [-0.06421195, -0.00754394, -0.04505453,  0.05208206, -0.031663  ]],
      dtype=float32)>, <tf.Tensor: id=509, shape=(2, 5), dtype=float32, numpy=
array([[-0.23299618, -0.03142868, -0.214081  ,  0.20567834, -0.14606045],
       [-0.12106555, -0.01712375, -0.10137994,  0.11782492, -0.07105252]],
      dtype=float32)>], [<tf.Tensor: id=799, shape=(2, 5), dtype=float32, numpy=
array([[ 0.00424142,  0.3668591 , -0.5833647 , -0.03675587,  0.0019763 ],
       [ 0.00434441,  0.32393652, -0.36846292, -0.01977784,  0.0016813 ]],
      dtype=float32)>, <tf.Tensor: id=800, shape=(2, 5), dtype=float32, numpy=
array([[ 0.00942778,  0.5103172 , -1.1896598 , -0.48518264,  0.3304861 ],
       [ 0.00973888,  0.4245502 , -0.58340454, -0.2169237 ,  0.32636905]],
      dtype=float32)>], [<tf.Tensor: id=801, shape=(2, 5), dtype=float32, numpy=
array([[-0.0243494 , -0.00789784,  0.10367834,  0.09167746,  0.01394088],
       [-0.00825483,  0.00164982,  0.0411781 ,  0.04471161, -0.00124086]],
      dtype=float32)>, <tf.Tensor: id=802, shape=(2, 5), dtype=float32, numpy=
array([[-0.04924963, -0.0166596 ,  0.21717109,  0.15558058,  0.02793371],
       [-0.01635381,  0.00336652,  0.08587213,  0.07893328, -0.00250007]],
      dtype=float32)>]]

可以看到output包含五个部分：

output[0]：[batch_size, decoder_len, num_units]

output[1]：[[batch_size, num_units/2], [batch_size, num_units/2]]

output[2]：[[batch_size, num_units/2], [batch_size, num_units/2]]

output[3]：[[batch_size, num_units/2], [batch_size, num_units/2]]

output[4]：[[batch_size, num_units/2], [batch_size, num_units/2]]

观察数据可以看出output[2][1]是最后的h，代表output[2][1]为双层RNN的最后一层的前向RNN。

所以结论就是输出结果形式：

[output[0], output[1], output[2], ..., output[num_layers*2]]

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[1]：是第0层的正向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

output[2]：是第1层的正向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

output[3]：是第0层的反向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

output[4]：是第1层的反向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

依次类推：

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[1: 1+num_layers]是每一层的正向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

output[1+num_layers:]是每一层的反向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

结论

单向多层GRU

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[N]是每一层的状态h[batch_size, num_units]

双向多层GRU

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[1: 1+num_layers]是每一层的正向h[batch_size, num_units/2]

output[1+num_layers:]是每一层的反向h[batch_size, num_units/2]

单向多层LSTM

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[N]是每一层的状态[h, c] [[batch_size, num_units], [batch_size, num_units]]

双向多层LSTM

output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

output[1: 1+num_layers]是每一层的正向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

output[1+num_layers:]是每一层的反向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

另外，单向时，pad的解码是最后一个时间步解码结果向后复制，双向时，pad的解码直接为0。

tensorflow2.0下对多层双向循环神经网络api的输出测

1 构造嵌入层

2 构造编码器

3 测试输出

3.0 实验参数设置

3.1 单向多层GRU

3.2 双向多层GRU

3.3 单向多层LSTM

3.4 双向多层LSTM

结论

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读