美文网首页
tensorflow2.0下对多层双向循环神经网络api的输出测

tensorflow2.0下对多层双向循环神经网络api的输出测

作者: 又双叒叕苟了一天 | 来源:发表于2019-10-11 16:24 被阅读0次

    问题背景:

    1、将单词的index序列输入embedding层编码成嵌入表示

    2、将单词的嵌入序列输入由RNN构成的编码器进行编码

    那么RNN编码器的输出的格式是怎么样的呢?在网上我们可以看到很多序列模型用到了双向的RNN,并堆叠了多层构成了多层的双向RNN。但是我们有时候也是需要中间层的状态的,通常的做法是需要另外构造一个model进行输出,这显然是不自由的。

    所以这次我们自己直接构造一个多层双向的RNN来检测他的输出结果到底是什么。这次测试针对的版本是tensorflow2.0,由于2.0版本的eager计算方式和自动图更新,所以下面都采用面向对象来编程。

    1 构造嵌入层

    import tensorflow.keras as keras
    
    class Embedding(keras.layers.Layer):
    
        def __init__(self, input_size,
                     output_size,
                     weights=None):
            super(Embedding, self).__init__()
    
            if weights is not None:
                self.embedding = keras.layers.Embedding(input_size, output_size, weights=weights, mask_zero=True)
            else:
                self.embedding = keras.layers.Embedding(input_size, output_size, mask_zero=True)
    
        def __call__(self, input):  # [batch, len]
    
            return self.embedding(input)  # [batch, len, output_size]
    

    这个嵌入层类主要封装了keras.layers.Embedding(input_size, output_size, weights=weights, mask_zero=True),这个api涉及的几个参数:

    1. 第一个参数为词汇表的维度
    2. 第二个参数为词嵌入维度
    3. weights为初始化的权值,如果有预训练的词嵌入,可以通过这个传入
    4. mask_zero这个参数实现了对index=0的单词的mask,我们通常把pad的符号设置为词汇表中的index=0,于是它产生一个mask并向后传递,在RNN中防止对句子中多余的pad符号进行解码。在tensorflow1.x中是通过在tf.nn.dynamic_rnn()这个api中传入一个encoder_len实现的,在pytorch中torch.nn.utils.rnn.pack_padded_sequence也起到了相同的作用。这个mask只会在RNN编码时起作用,并不会把pad的词嵌入变成全0,原来是什么就是什么。
    # 嵌入层测试
    weights = [np.array([[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]], dtype=np.float64)]
    embedding = Embedding(3, 5, weights)  # (num_vocab, embedding_size)
    
    word_id = tf.convert_to_tensor([[1, 2, 0], [1, 0, 0]], dtype=tf.int64)
    word_embed = embedding(word_id)  # [batch, seq, embedding_size]
    
    print(word_embed)
    >>tf.Tensor(
    [[[2. 3. 4. 5. 6.]
      [3. 4. 5. 6. 7.]
      [1. 2. 3. 4. 5.]]
    
     [[2. 3. 4. 5. 6.]
      [1. 2. 3. 4. 5.]
      [1. 2. 3. 4. 5.]]], shape=(2, 3, 5), dtype=float32)
    

    2 构造编码器

    import tensorflow.keras as keras
    
    class Encoder(keras.layers.Layer):
    
        def __init__(self, rnn_type,  # rnn类型
                     input_size,
                     output_size,
                     num_layers,  # rnn层数
                     bidirectional=False):
            super(Encoder, self).__init__()
            assert rnn_type in ['GRU', 'LSTM']
            if bidirectional:
                assert output_size % 2 == 0
    
            if bidirectional:
                self.num_directions = 2
            else:
                self.num_directions = 1
    
            units = int(output_size / self.num_directions)
    
            if rnn_type == 'GRU':
                rnnCell = [getattr(keras.layers, 'GRUCell')(units) for _ in range(num_layers)]
            else:
                rnnCell = [getattr(keras.layers, 'LSTMCell')(units) for _ in range(num_layers)]
    
            self.rnn = keras.layers.RNN(rnnCell, input_shape=(None, None, input_size),
                                        return_sequences=True, return_state=True)
            self.rnn_type = rnn_type
            self.num_layers = num_layers
    
            if bidirectional:
                self.rnn = keras.layers.Bidirectional(self.rnn, merge_mode='concat')
    
            self.bidirectional = bidirectional
    
    
        def __call__(self, input):  # [batch, timesteps, input_dim]
    
            outputs = self.rnn(input)
    
            output = outputs[0]
            states = outputs[1:]
    
            print(outputs)  # 用于测试的输出
            print(len(outputs))  # 用于测试的输出
            print(len(states))  # 用于测试的输出
    
            return output, states
    

    构造方法:

    1. 通过rnnCell = [getattr(keras.layers, 'GRUCell')(units) for _ in range(num_layers)]或rnnCell = [getattr(keras.layers, 'LSTMCell')(units) for _ in range(num_layers)]获得num_layers层RNN单元列表
    2. 通过rnn = keras.layers.RNN(rnnCell, input_shape=(None, None, input_size),
      return_sequences=True, return_state=True)传入RNN单元列表构造一个多层的RNN,return_sequences=True代表输出每个时间步的输出,而不是最后一个时间步的输出,return_state=True代表返回RNN状态,False的话就不返回状态了。
    3. 通过rnn = keras.layers.Bidirectional(self.rnn, merge_mode='concat')加上双向,merge_mode='concat'代表通过拼接方式产生输出

    3 测试输出

    前面说了那么多废话,下面开始对输出进行测试,对这部分没兴趣的可以直接看第4部分的结论。

    3.0 实验参数设置

    # num_vocab = 3
    # embedding_size = 5
    # batch_size = 2
    # encoder_len = 3
    # num_units = 10
    # num_layers = 2
    

    3.1 单向多层GRU

    对2层进行测试:

    outputs = rnn(input)
    >>[<tf.Tensor: id=553, shape=(2, 3, 10), dtype=float32, numpy=
    array([[[-3.5420116e-02, -8.9026507e-05,  2.2907217e-01,  1.9754110e-01,
             -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615253e-02,
             -1.8428519e-01, -2.1019778e-01],
            [-7.6624170e-02,  3.7288409e-02,  3.4195143e-01,  3.2474262e-01,
             -7.6712951e-02, -3.0440533e-01,  1.9677658e-01,  1.2763622e-01,
             -2.7749074e-01, -3.2409826e-01],
            [-7.6624170e-02,  3.7288409e-02,  3.4195143e-01,  3.2474262e-01,
             -7.6712951e-02, -3.0440533e-01,  1.9677658e-01,  1.2763622e-01,
             -2.7749074e-01, -3.2409826e-01]],
    
           [[-3.5420127e-02, -8.9021691e-05,  2.2907217e-01,  1.9754107e-01,
             -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615216e-02,
             -1.8428519e-01, -2.1019775e-01],
            [-3.5420127e-02, -8.9021691e-05,  2.2907217e-01,  1.9754107e-01,
             -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615216e-02,
             -1.8428519e-01, -2.1019775e-01],
            [-3.5420127e-02, -8.9021691e-05,  2.2907217e-01,  1.9754107e-01,
             -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615216e-02,
             -1.8428519e-01, -2.1019775e-01]]], dtype=float32)>, <tf.Tensor: id=542, shape=(2, 10), dtype=float32, numpy=
    array([[ 0.10095029, -0.998891  , -0.48548818, -0.00963031, -0.97031355,
            -0.12160255,  0.999949  , -0.10839747, -0.18006183, -0.17532544],
           [ 0.03464954, -0.9603172 , -0.53084654,  0.00194323, -0.8031896 ,
            -0.07652862,  0.9911491 , -0.06364062, -0.11014236, -0.14036107]],
          dtype=float32)>, <tf.Tensor: id=543, shape=(2, 10), dtype=float32, numpy=
    array([[-7.6624170e-02,  3.7288409e-02,  3.4195143e-01,  3.2474262e-01,
            -7.6712951e-02, -3.0440533e-01,  1.9677658e-01,  1.2763622e-01,
            -2.7749074e-01, -3.2409826e-01],
           [-3.5420127e-02, -8.9021691e-05,  2.2907217e-01,  1.9754107e-01,
            -3.2863699e-02, -2.4253847e-01,  1.2058940e-01,  6.2615216e-02,
            -1.8428519e-01, -2.1019775e-01]], dtype=float32)>]
    

    可以看到output包含三个部分:

    output[0]:[batch_size, decoder_len, num_units]

    output[1]:[batch_size, num_units]

    output[2]:[batch_size, num_units]

    观察数据可以看出output[2]是最后的h,代表output[2]为双层RNN的最后一层。查看output[0]的数据我们也能发现受到嵌入层mask的作用,pad部分的编码结果和句子结束时状态是一样的,只是向后复制了。

    所以结论就是输出结果形式:

    [output[0], output[1], output[2], ..., output[num_layers]]

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[N]是每一层的状态h[batch_size, num_units]

    3.2 双向多层GRU

    对2层进行测试:

    outputs = rnn(input)
    >>[<tf.Tensor: id=1096, shape=(2, 3, 10), dtype=float32, numpy=
    array([[[-0.01417219,  0.13640611,  0.32041013,  0.00786568,
             -0.03442783,  0.46687838,  0.14251477, -0.0060271 ,
             -0.03813943, -0.4147334 ],
            [-0.0154626 ,  0.22333089,  0.49720186, -0.02729558,
             -0.13843244,  0.30179217,  0.10419664, -0.0332097 ,
             -0.06268977, -0.33545047],
            [ 0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ]],
    
           [[-0.01417219,  0.13640611,  0.32041013,  0.00786567,
             -0.03442784,  0.29038957,  0.09369997, -0.00535166,
             -0.02358363, -0.31432554],
            [ 0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ],
            [ 0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ]]], dtype=float32)>, <tf.Tensor: id=632, shape=(2, 5), dtype=float32, numpy=
    array([[-0.01106511,  0.97525597,  0.38123077,  0.15792789, -0.8506844 ],
           [-0.00910319,  0.8093642 ,  0.2359951 , -0.14750779, -0.56568766]],
          dtype=float32)>, <tf.Tensor: id=633, shape=(2, 5), dtype=float32, numpy=
    array([[-0.0154626 ,  0.22333089,  0.49720186, -0.02729558, -0.13843244],
           [-0.01417219,  0.13640611,  0.32041013,  0.00786567, -0.03442784]],
          dtype=float32)>, <tf.Tensor: id=1081, shape=(2, 5), dtype=float32, numpy=
    array([[0.3142835 , 0.98540443, 0.26638144, 0.00319364, 0.98887223],
           [0.36952233, 0.9663322 , 0.17328681, 0.00246616, 0.9730079 ]],
          dtype=float32)>, <tf.Tensor: id=1082, shape=(2, 5), dtype=float32, numpy=
    array([[ 0.46687838,  0.14251477, -0.0060271 , -0.03813943, -0.4147334 ],
           [ 0.29038957,  0.09369997, -0.00535166, -0.02358363, -0.31432554]],
          dtype=float32)>]
    

    可以看到output包含五个部分:

    output[0]:[batch_size, decoder_len, num_units]

    output[1]:[batch_size, num_units/2]

    output[2]:[batch_size, num_units/2]

    output[3]:[batch_size, num_units/2]

    output[4]:[batch_size, num_units/2]

    观察数据可以看出output[2]是最后的h,代表output[2]为双层RNN的最后一层的前向RNN。查看output[0]的数据我们也能发现受到嵌入层mask的作用,pad部分的编码结果为0(和单向略有不同)。

    所以结论就是输出结果形式:

    [output[0], output[1], output[2], ..., output[num_layers*2]]

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[1]:是第0层的正向h[batch_size, num_units/2]

    output[2]:是第1层的正向h[batch_size, num_units/2]

    output[3]:是第0层的反向h[batch_size, num_units/2]

    output[4]:是第1层的反向h[batch_size, num_units/2]

    依次类推:

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[1: 1+num_layers]是每一层的正向h[batch_size, num_units/2]

    output[1+num_layers:]是每一层的反向h[batch_size, num_units/2]

    3.3 单向多层LSTM

    对2层进行测试:

    outputs = rnn(input)
    >>[<tf.Tensor: id=413, shape=(2, 3, 10), dtype=float32, numpy=
    array([[[ 0.03599537, -0.01473989,  0.05308587, -0.00895863,
              0.01214957, -0.03720263,  0.02418177, -0.01348425,
             -0.01298695, -0.03001863],
            [ 0.07842067, -0.03227948,  0.09026823, -0.02830549,
              0.01443951, -0.07027332,  0.05110155, -0.02023602,
             -0.01933629, -0.05507426],
            [ 0.07842067, -0.03227948,  0.09026823, -0.02830549,
              0.01443951, -0.07027332,  0.05110155, -0.02023602,
             -0.01933629, -0.05507426]],
    
           [[ 0.03599537, -0.01473989,  0.05308587, -0.00895863,
              0.01214957, -0.03720263,  0.02418176, -0.01348425,
             -0.01298695, -0.03001863],
            [ 0.03599537, -0.01473989,  0.05308587, -0.00895863,
              0.01214957, -0.03720263,  0.02418176, -0.01348425,
             -0.01298695, -0.03001863],
            [ 0.03599537, -0.01473989,  0.05308587, -0.00895863,
              0.01214957, -0.03720263,  0.02418176, -0.01348425,
             -0.01298695, -0.03001863]]], dtype=float32)>, [<tf.Tensor: id=400, shape=(2, 10), dtype=float32, numpy=
    array([[ 0.03796372, -0.00646253, -0.10610048,  0.2621497 ,  0.00817543,
             0.08675741,  0.03996095,  0.16117425,  0.65429616, -0.07473923],
           [ 0.03174995, -0.0089063 , -0.07151143,  0.1907991 ,  0.01177687,
             0.04312354,  0.02712633,  0.19289187,  0.51734495, -0.09216765]],
          dtype=float32)>, <tf.Tensor: id=401, shape=(2, 10), dtype=float32, numpy=
    array([[ 0.44051132, -0.2818818 , -0.11988518,  1.2482902 ,  0.17308153,
             0.69406235,  0.06025018,  1.0685071 ,  0.797681  , -0.1052426 ],
           [ 0.22792174, -0.23269363, -0.0844808 ,  0.6085427 ,  0.16032045,
             0.3221852 ,  0.04220397,  0.8066951 ,  0.5936996 , -0.12931918]],
          dtype=float32)>], [<tf.Tensor: id=402, shape=(2, 10), dtype=float32, numpy=
    array([[ 0.07842067, -0.03227948,  0.09026823, -0.02830549,  0.01443951,
            -0.07027332,  0.05110155, -0.02023602, -0.01933629, -0.05507426],
           [ 0.03599537, -0.01473989,  0.05308587, -0.00895863,  0.01214957,
            -0.03720263,  0.02418176, -0.01348425, -0.01298695, -0.03001863]],
          dtype=float32)>, <tf.Tensor: id=403, shape=(2, 10), dtype=float32, numpy=
    array([[ 0.15394947, -0.06263469,  0.19750515, -0.05156851,  0.02507691,
            -0.14487514,  0.0979518 , -0.03745949, -0.04038396, -0.11667444],
           [ 0.07117884, -0.02876606,  0.11274612, -0.01666791,  0.02163199,
            -0.07605074,  0.0462449 , -0.0253415 , -0.02669653, -0.0623024 ]],
          dtype=float32)>]]
    

    可以看到output包含三个部分:

    output[0]:[batch_size, decoder_len, num_units]

    output[1]:[[batch_size, num_units], [batch_size, num_units]]

    output[2]:[[batch_size, num_units], [batch_size, num_units]]

    观察数据可以看出output[2][1]是最后的h,代表output[2][1]为双层RNN的最后一层的h。

    所以结论就是输出结果形式:

    [output[0], output[1], output[2], ..., output[num_layers]]

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[N]是每一层的状态[h, c] [[batch_size, num_units], [batch_size, num_units]]

    3.4 双向多层LSTM

    对2层进行测试:

    outputs = rnn(input)
    >>[<tf.Tensor: id=816, shape=(2, 3, 10), dtype=float32, numpy=
    array([[[-0.06421194, -0.00754393, -0.04505453,  0.05208206,
             -0.03166301, -0.0243494 , -0.00789784,  0.10367834,
              0.09167746,  0.01394088],
            [-0.1210794 , -0.01336129, -0.09259984,  0.08671384,
             -0.06314958, -0.00972542,  0.00197651,  0.04819337,
              0.05299319, -0.00179022],
            [ 0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ]],
    
           [[-0.06421195, -0.00754394, -0.04505453,  0.05208206,
             -0.031663  , -0.00825483,  0.00164982,  0.0411781 ,
              0.04471161, -0.00124086],
            [ 0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ],
            [ 0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ,  0.        ,  0.        ,
              0.        ,  0.        ]]], dtype=float32)>, [<tf.Tensor: id=506, shape=(2, 5), dtype=float32, numpy=
    array([[ 8.3107513e-01, -6.2514983e-02, -2.2869313e-01,  2.0354016e-02,
            -2.1946893e-04],
           [ 6.7014122e-01, -6.0981486e-02, -1.2038765e-01,  1.5553602e-02,
            -9.7971398e-04]], dtype=float32)>, <tf.Tensor: id=507, shape=(2, 5), dtype=float32, numpy=
    array([[ 1.234918  , -0.3281948 , -0.28206116,  0.06127462, -0.39995325],
           [ 0.8689103 , -0.22541635, -0.15223289,  0.04101423, -0.34894544]],
          dtype=float32)>], [<tf.Tensor: id=508, shape=(2, 5), dtype=float32, numpy=
    array([[-0.1210794 , -0.01336129, -0.09259984,  0.08671384, -0.06314958],
           [-0.06421195, -0.00754394, -0.04505453,  0.05208206, -0.031663  ]],
          dtype=float32)>, <tf.Tensor: id=509, shape=(2, 5), dtype=float32, numpy=
    array([[-0.23299618, -0.03142868, -0.214081  ,  0.20567834, -0.14606045],
           [-0.12106555, -0.01712375, -0.10137994,  0.11782492, -0.07105252]],
          dtype=float32)>], [<tf.Tensor: id=799, shape=(2, 5), dtype=float32, numpy=
    array([[ 0.00424142,  0.3668591 , -0.5833647 , -0.03675587,  0.0019763 ],
           [ 0.00434441,  0.32393652, -0.36846292, -0.01977784,  0.0016813 ]],
          dtype=float32)>, <tf.Tensor: id=800, shape=(2, 5), dtype=float32, numpy=
    array([[ 0.00942778,  0.5103172 , -1.1896598 , -0.48518264,  0.3304861 ],
           [ 0.00973888,  0.4245502 , -0.58340454, -0.2169237 ,  0.32636905]],
          dtype=float32)>], [<tf.Tensor: id=801, shape=(2, 5), dtype=float32, numpy=
    array([[-0.0243494 , -0.00789784,  0.10367834,  0.09167746,  0.01394088],
           [-0.00825483,  0.00164982,  0.0411781 ,  0.04471161, -0.00124086]],
          dtype=float32)>, <tf.Tensor: id=802, shape=(2, 5), dtype=float32, numpy=
    array([[-0.04924963, -0.0166596 ,  0.21717109,  0.15558058,  0.02793371],
           [-0.01635381,  0.00336652,  0.08587213,  0.07893328, -0.00250007]],
          dtype=float32)>]]
    

    可以看到output包含五个部分:

    output[0]:[batch_size, decoder_len, num_units]

    output[1]:[[batch_size, num_units/2], [batch_size, num_units/2]]

    output[2]:[[batch_size, num_units/2], [batch_size, num_units/2]]

    output[3]:[[batch_size, num_units/2], [batch_size, num_units/2]]

    output[4]:[[batch_size, num_units/2], [batch_size, num_units/2]]

    观察数据可以看出output[2][1]是最后的h,代表output[2][1]为双层RNN的最后一层的前向RNN。

    所以结论就是输出结果形式:

    [output[0], output[1], output[2], ..., output[num_layers*2]]

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[1]:是第0层的正向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

    output[2]:是第1层的正向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

    output[3]:是第0层的反向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

    output[4]:是第1层的反向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

    依次类推:

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[1: 1+num_layers]是每一层的正向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

    output[1+num_layers:]是每一层的反向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

    结论

    单向多层GRU

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[N]是每一层的状态h[batch_size, num_units]

    双向多层GRU

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[1: 1+num_layers]是每一层的正向h[batch_size, num_units/2]

    output[1+num_layers:]是每一层的反向h[batch_size, num_units/2]

    单向多层LSTM

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[N]是每一层的状态[h, c] [[batch_size, num_units], [batch_size, num_units]]

    双向多层LSTM

    output[0]是每个时间步的输出[batch_size, decoder_len, num_units]

    output[1: 1+num_layers]是每一层的正向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

    output[1+num_layers:]是每一层的反向[h, c] [[batch_size, num_units/2], [batch_size, num_units/2]]

    另外,单向时,pad的解码是最后一个时间步解码结果向后复制,双向时,pad的解码直接为0。

    相关文章

      网友评论

          本文标题:tensorflow2.0下对多层双向循环神经网络api的输出测

          本文链接:https://www.haomeiwen.com/subject/pmqxmctx.html