（八）sequence to sequence —5

作者: 天生smile | 来源:发表于2018-12-12 16:21 被阅读0次

（八）sequence to sequence —5
（八）sequence to sequence —4
（八）sequence to sequence —6
（八）sequence to sequence —1
（八）sequence to sequence —2
（八）sequence to sequence —3
有趣的斐波那契数列
5.The Child and Sequence
Sequence to Sequence学习资料
Swift 4.1 - SE-0187 Introduce Se

实现多层双向的dynamic_lstm+beam_search

基于tensorflow1.4 Seq2seq的实现

encoder使用的两层双向的LSTM，注意multi_RNN与bi_dynamic_lstm（并不兼容）

import helpers
import tensorflow as tf
from tensorflow.python.util import nest
from tensorflow.contrib import seq2seq,rnn

tf.__version__

tf.reset_default_graph()
sess = tf.InteractiveSession()

PAD = 0
EOS = 1


vocab_size = 10
input_embedding_size = 20
encoder_hidden_units = 25

decoder_hidden_units = encoder_hidden_units

import helpers as data_helpers
batch_size = 10

# 一个generator，每次产生一个minibatch的随机样本

batches = data_helpers.random_sequences(length_from=3, length_to=8,
                                   vocab_lower=2, vocab_upper=10,
                                   batch_size=batch_size)

print('产生%d个长度不一（最短3，最长8）的sequences, 其中前十个是:' % batch_size)
for seq in next(batches)[:min(batch_size, 10)]:
    print(seq)
    
tf.reset_default_graph()
sess = tf.InteractiveSession()
mode = tf.contrib.learn.ModeKeys.TRAIN

产生10个长度不一（最短3，最长8）的sequences, 其中前十个是:
[6, 6, 3, 9, 7, 7, 9, 4]
[9, 3, 6, 3, 6, 6, 4, 5]
[5, 4, 2, 2, 3, 9, 8, 7]
[3, 2, 7]
[8, 5, 9, 4, 5, 2]
[6, 5, 8, 9, 4]
[3, 9, 6, 5, 2, 2]
[3, 2, 2, 3]
[8, 8, 7, 6, 8]
[5, 3, 3, 6, 8, 7, 4, 9]

1.使用seq2seq库实现seq2seq模型

with tf.name_scope('minibatch'):
    encoder_inputs = tf.placeholder(tf.int32, [None, None], name='encoder_inputs')
    
    encoder_inputs_length = tf.placeholder(tf.int32, [None], name='encoder_inputs_length')
    
    decoder_targets = tf.placeholder(tf.int32, [None, None], name='decoder_targets')
    
    decoder_inputs = tf.placeholder(shape=(None, None),dtype=tf.int32,name='decoder_inputs')
    
    #decoder_inputs_length和decoder_targets_length是一样的
    decoder_inputs_length = tf.placeholder(shape=(None,),
                                            dtype=tf.int32,
                                            name='decoder_inputs_length')
    
# 构建embedding矩阵,encoder和decoder公用该词向量矩阵
embedding = tf.get_variable('embedding', [vocab_size,input_embedding_size])
encoder_inputs_embedded = tf.nn.embedding_lookup(embedding,encoder_inputs)

#fw_cell = bw_cell =  rnn.LSTMCell(encoder_hidden_units)

定义encoder，两层双向lstm

_inputs=encoder_inputs_embedded
for _ in range(2):
    #为什么在这加个variable_scope,被逼的,tf在rnn_cell的__call__中非要搞一个命名空间检查
    #恶心的很.如果不在这加的话,会报错的.
    with tf.variable_scope(None, default_name="bidirectional-rnn"):
        rnn_cell_bw =  rnn_cell_fw = rnn.LSTMCell(encoder_hidden_units)
        #rnn_cell_bw = rnn.LSTMCell(encoder_hidden_units)
        #initial_state_fw = rnn_cell_fw.zero_state(batch_size, dtype=tf.float32)
        #initial_state_bw = rnn_cell_bw.zero_state(batch_size, dtype=tf.float32)
        ((encoder_fw_outputs,encoder_bw_outputs),(encoder_fw_final_state,encoder_bw_final_state))\
        = tf.nn.bidirectional_dynamic_rnn(cell_fw=rnn_cell_fw,
                                              cell_bw=rnn_cell_bw, 
                                              inputs=_inputs, 
                                              sequence_length=encoder_inputs_length,
                                              dtype=tf.float32)
        _inputs = tf.concat((encoder_fw_outputs,encoder_bw_outputs), 2)
#取最后一层的 final_state    
encoder_final_state_h = tf.concat((encoder_fw_final_state.h, encoder_bw_final_state.h), 1)
encoder_final_state_c = tf.concat((encoder_fw_final_state.c, encoder_bw_final_state.c), 1)
encoder_final_state = rnn.LSTMStateTuple(c=encoder_final_state_c, h=encoder_final_state_h)
encoder_final_output = _inputs

    encoder_final_state

LSTMStateTuple(c=<tf.Tensor 'concat_3:0' shape=(?, 50) dtype=float32>, h=<tf.Tensor 'concat_2:0' shape=(?, 50) dtype=float32>)

    encoder_final_output

<tf.Tensor 'bidirectional-rnn_4/concat:0' shape=(?, ?, 50) dtype=float32>

5.定义decoder 部分

def _create_rnn_cell2():
    def single_rnn_cell(encoder_hidden_units):
        # 创建单个cell，这里需要注意的是一定要使用一个single_rnn_cell的函数，不然直接把cell放在MultiRNNCell
        # 的列表中最终模型会发生错误
        single_cell = rnn.LSTMCell(encoder_hidden_units*2)
        #添加dropout
        single_cell = rnn.DropoutWrapper(single_cell, output_keep_prob=0.5)
        return single_cell
            #列表中每个元素都是调用single_rnn_cell函数
            #cell = rnn.MultiRNNCell([single_rnn_cell() for _ in range(self.num_layers)])
    cell = rnn.MultiRNNCell([single_rnn_cell(encoder_hidden_units) for _ in range(1)])
    return cell 

with tf.variable_scope('decoder'):
    #single_cell = rnn.LSTMCell(encoder_hidden_units)
    #decoder_cell = rnn.MultiRNNCell([single_cell for _ in range(1)])
    decoder_cell = rnn.LSTMCell(encoder_hidden_units*2)
    #定义decoder的初始状态
    decoder_initial_state = encoder_final_state
    
    #定义output_layer
    output_layer = tf.layers.Dense(vocab_size,kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))
    
    decoder_inputs_embedded = tf.nn.embedding_lookup(embedding, decoder_inputs)
    
    # 训练阶段，使用TrainingHelper+BasicDecoder的组合，这一般是固定的，当然也可以自己定义Helper类，实现自己的功能
    training_helper = seq2seq.TrainingHelper(inputs=decoder_inputs_embedded,
                                                        sequence_length=decoder_inputs_length,
                                                        time_major=False, name='training_helper')
    training_decoder = seq2seq.BasicDecoder(cell=decoder_cell, helper=training_helper,
                                                       initial_state=decoder_initial_state,
                                                       output_layer=output_layer)
    
    # 调用dynamic_decode进行解码，decoder_outputs是一个namedtuple，里面包含两项(rnn_outputs, sample_id)
    # rnn_output: [batch_size, decoder_targets_length, vocab_size]，保存decode每个时刻每个单词的概率，可以用来计算loss
    # sample_id: [batch_size], tf.int32，保存最终的编码结果。可以表示最后的答案
    max_target_sequence_length = tf.reduce_max(decoder_inputs_length, name='max_target_len')
    decoder_outputs, _, _ = seq2seq.dynamic_decode(decoder=training_decoder,
                                                          impute_finished=True,
                                                          maximum_iterations=max_target_sequence_length)
    decoder_logits_train = tf.identity(decoder_outputs.rnn_output)
    sample_id = decoder_outputs.sample_id
    max_target_sequence_length = tf.reduce_max(decoder_inputs_length, name='max_target_len')
    mask = tf.sequence_mask(decoder_inputs_length,max_target_sequence_length, dtype=tf.float32, name='masks')
    print('\t%s' % repr(decoder_logits_train))
    print('\t%s' % repr(decoder_targets))
    print('\t%s' % repr(sample_id))
    loss = seq2seq.sequence_loss(logits=decoder_logits_train,targets=decoder_targets, weights=mask)

    <tf.Tensor 'decoder/Identity:0' shape=(?, ?, 10) dtype=float32>
    <tf.Tensor 'minibatch/decoder_targets:0' shape=(?, ?) dtype=int32>
    <tf.Tensor 'decoder/decoder/transpose_1:0' shape=(?, ?) dtype=int32>

with tf.variable_scope('decoder',reuse=True):
    start_tokens = tf.ones([batch_size, ], tf.int32)*1  #[batch_size]  数值为1
    encoder_state = nest.map_structure(lambda s: seq2seq.tile_batch(s, 3),
                                                   encoder_final_state)
    inference_decoder = tf.contrib.seq2seq.BeamSearchDecoder(cell=decoder_cell, embedding=embedding,
                                                                             start_tokens=start_tokens,
                                                                             end_token=1,
                                                                             initial_state=encoder_state,
                                                                             beam_width=3,
                                                                             output_layer=output_layer)
    beam_decoder_outputs, _, _ = seq2seq.dynamic_decode(decoder=inference_decoder,maximum_iterations=10)

train_op = tf.train.AdamOptimizer(learning_rate = 0.001).minimize(loss)
sess.run(tf.global_variables_initializer())
def next_feed():
    batch = next(batches)
    
    encoder_inputs_, encoder_inputs_length_ = data_helpers.batch(batch)
    decoder_targets_, decoder_targets_length_ = data_helpers.batch(
        [(sequence) + [EOS] for sequence in batch]
    )
    decoder_inputs_, decoder_inputs_length_ = data_helpers.batch(
        [[EOS] + (sequence) for sequence in batch]
    )
    
    # 在feedDict里面，key可以是一个Tensor
    return {
        encoder_inputs: encoder_inputs_.T,
        decoder_inputs: decoder_inputs_.T,
        decoder_targets: decoder_targets_.T,
        encoder_inputs_length: encoder_inputs_length_,
        decoder_inputs_length: decoder_inputs_length_
    }

x = next_feed()
print('encoder_inputs:')
print(x[encoder_inputs][0,:])
print('encoder_inputs_length:')
print(x[encoder_inputs_length][0])
print('decoder_inputs:')
print(x[decoder_inputs][0,:])
print('decoder_inputs_length:')
print(x[decoder_inputs_length][0])
print('decoder_targets:')
print(x[decoder_targets][0,:])

encoder_inputs:
[3 3 7 3 0 0 0 0]
encoder_inputs_length:
4
decoder_inputs:
[1 3 3 7 3 0 0 0 0]
decoder_inputs_length:
5
decoder_targets:
[3 3 7 3 1 0 0 0 0]

loss_track = []
max_batches = 6001
batches_in_epoch = 200

try:
    # 一个epoch的learning
    for batch in range(max_batches):
        fd = next_feed()
        _, l = sess.run([train_op, loss], fd)
        loss_track.append(l)
        
        if batch == 0 or batch % batches_in_epoch == 0:
            print('batch {}'.format(batch))
            print('  minibatch loss: {}'.format(sess.run(loss, fd)))
            predict_ = sess.run(beam_decoder_outputs.predicted_ids, fd)
            #print(predict_)
            for i, (inp, pred) in enumerate(zip(fd[encoder_inputs], predict_)):
                print('  sample {}:'.format(i + 1))
                print('    input     > {}'.format(inp))
                print('    predicted > {}'.format(pred))
                if i >= 2:
                    break
            print()
        
except KeyboardInterrupt:
    print('training interrupted')

batch 0
  minibatch loss: 2.2935664653778076
  sample 1:
    input     > [9 2 8 0 0 0 0 0]
    predicted > [[5 5 5]
 [5 5 5]
 [5 5 5]
 [5 5 8]
 [8 5 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]]
  sample 2:
    input     > [6 8 9 7 2 6 9 3]
    predicted > [[5 5 5]
 [5 5 5]
 [5 5 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 9 8]]
  sample 3:
    input     > [3 6 6 0 0 0 0 0]
    predicted > [[5 5 5]
 [5 5 5]
 [5 5 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 8 8]
 [8 9 8]]

batch 200
  minibatch loss: 1.4949365854263306
  sample 1:
    input     > [4 7 8 6 7 9 0 0]
    predicted > [[ 3  3  4]
 [ 4  4  3]
 [ 5  5  5]
 [ 7  5  5]
 [ 4  9  9]
 [ 9  4  4]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [9 3 7 4 0 0 0 0]
    predicted > [[ 4  4  4]
 [ 9  9  9]
 [ 5  4  9]
 [ 4  4  4]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [2 5 9 6 0 0 0 0]
    predicted > [[ 9  6  9]
 [ 6  9  6]
 [ 6  2  6]
 [ 1  1  2]
 [-1 -1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 400
  minibatch loss: 1.2325794696807861
  sample 1:
    input     > [6 7 2 9 0 0 0 0]
    predicted > [[ 6  6  6]
 [ 6  2  2]
 [ 2  4  4]
 [ 4  6  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [8 7 3 3 3 2 9 0]
    predicted > [[8 8 8]
 [3 3 3]
 [3 3 3]
 [2 2 2]
 [3 5 5]
 [5 2 2]
 [9 5 9]
 [1 1 1]]
  sample 3:
    input     > [6 4 2 4 3 7 2 0]
    predicted > [[4 4 4]
 [2 2 2]
 [7 2 2]
 [2 7 7]
 [4 4 4]
 [6 7 7]
 [4 6 2]
 [1 1 1]]

batch 600
  minibatch loss: 0.9292899370193481
  sample 1:
    input     > [4 9 5 9 9 2 0 0]
    predicted > [[ 9  9  9]
 [ 4  4  4]
 [ 5  5  4]
 [ 9  9  9]
 [ 2  2  7]
 [ 4  5  2]
 [ 1  1  1]
 [-1 -1 -1]]
  sample 2:
    input     > [8 2 6 4 7 0 0 0]
    predicted > [[ 7  7  4]
 [ 2  2  2]
 [ 4  4  7]
 [ 6  6  6]
 [ 4  5  8]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [7 9 7 9 6 0 0 0]
    predicted > [[ 9  9  9]
 [ 7  7  7]
 [ 7  7  7]
 [ 6  9  9]
 [ 9  7  6]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 800
  minibatch loss: 0.7363898754119873
  sample 1:
    input     > [9 2 6 0 0 0 0 0]
    predicted > [[ 9  2  9]
 [ 2  9  6]
 [ 6  6  2]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [6 8 7 9 6 3 2 0]
    predicted > [[ 6  6  6]
 [ 5  8  5]
 [ 3  6  6]
 [ 6  5  3]
 [ 9  6  3]
 [ 7  3  9]
 [ 3  2  7]
 [ 1  1  1]
 [-1 -1 -1]]
  sample 3:
    input     > [9 2 8 4 9 6 9 3]
    predicted > [[9 9 9]
 [3 2 3]
 [9 9 9]
 [2 8 4]
 [4 4 2]
 [9 6 9]
 [7 9 7]
 [8 7 8]
 [1 1 1]]

batch 1000
  minibatch loss: 0.7347214221954346
  sample 1:
    input     > [3 3 8 0 0 0 0 0]
    predicted > [[ 3  3  3]
 [ 3  3  8]
 [ 8  3  6]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [3 8 4 9 6 3 5 4]
    predicted > [[ 3  3  3]
 [ 3  3  3]
 [ 4  4  4]
 [ 5  5  5]
 [ 9  6  6]
 [ 5  9  9]
 [ 6  4  4]
 [ 1  1  8]
 [-1 -1  1]]
  sample 3:
    input     > [3 4 7 0 0 0 0 0]
    predicted > [[ 3  4  7]
 [ 4  3  4]
 [ 7  7  3]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 1200
  minibatch loss: 0.43508097529411316
  sample 1:
    input     > [5 5 5 4 4 4 4 0]
    predicted > [[ 5  5  5]
 [ 5  5  4]
 [ 4  4  5]
 [ 4  5  5]
 [ 5  4  4]
 [ 5  4  5]
 [ 1  5  1]
 [-1  1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [2 7 3 0 0 0 0 0]
    predicted > [[ 2  7  2]
 [ 7  2  7]
 [ 3  3  6]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [2 2 8 0 0 0 0 0]
    predicted > [[ 2  2  2]
 [ 2  2  8]
 [ 8  5  2]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 1400
  minibatch loss: 0.41912826895713806
  sample 1:
    input     > [7 8 5 3 2 0 0 0]
    predicted > [[ 5  7  8]
 [ 7  8  7]
 [ 8  5  5]
 [ 3  3  3]
 [ 2  2  2]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [8 7 6 9 2 0 0 0]
    predicted > [[ 8  7  8]
 [ 7  8  6]
 [ 6  9  7]
 [ 9  6  2]
 [ 2  2  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [7 8 3 8 5 8 7 6]
    predicted > [[8 8 8]
 [7 8 7]
 [8 7 8]
 [3 3 3]
 [7 7 7]
 [8 8 5]
 [6 6 3]
 [5 5 5]
 [1 1 1]]

batch 1600
  minibatch loss: 0.3989475965499878
  sample 1:
    input     > [3 7 4 4 7 2 0 0]
    predicted > [[ 3  7  7]
 [ 7  3  4]
 [ 4  4  3]
 [ 4  4  3]
 [ 7  3  4]
 [ 2  2  2]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [5 7 6 4 2 4 9 5]
    predicted > [[5 5 5]
 [7 7 7]
 [6 6 6]
 [4 4 4]
 [2 2 2]
 [5 4 9]
 [9 9 5]
 [4 5 4]
 [1 1 1]]
  sample 3:
    input     > [7 6 4 0 0 0 0 0]
    predicted > [[ 7  6  7]
 [ 6  7  4]
 [ 4  4  6]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 1800
  minibatch loss: 0.31475651264190674
  sample 1:
    input     > [8 5 9 9 9 4 0 0]
    predicted > [[ 8  8  8]
 [ 5  9  9]
 [ 9  5  5]
 [ 9  4  9]
 [ 4  9  4]
 [ 9  9  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [9 6 6 0 0 0 0 0]
    predicted > [[ 9  6  6]
 [ 6  9  9]
 [ 6  6  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [7 3 6 3 3 7 0 0]
    predicted > [[ 3  7  7]
 [ 7  3  3]
 [ 6  6  3]
 [ 3  3  6]
 [ 7  3  7]
 [ 3  7  3]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 2000
  minibatch loss: 0.41449815034866333
  sample 1:
    input     > [6 6 8 0 0 0 0 0]
    predicted > [[ 6  6  6]
 [ 6  9  3]
 [ 8  7  9]
 [ 1  1  7]
 [-1 -1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [8 8 5 6 2 9 2 3]
    predicted > [[8 8 8]
 [8 8 8]
 [5 5 6]
 [6 6 5]
 [2 2 2]
 [9 9 9]
 [2 3 2]
 [3 2 3]
 [1 1 1]]
  sample 3:
    input     > [5 2 3 2 4 7 7 6]
    predicted > [[2 5 2]
 [5 2 5]
 [4 2 4]
 [3 3 3]
 [2 5 2]
 [7 7 7]
 [7 4 6]
 [6 6 5]
 [1 1 1]]

batch 2200
  minibatch loss: 0.2028750777244568
  sample 1:
    input     > [2 5 6 8 6 7 7]
    predicted > [[2 2 2]
 [5 9 6]
 [6 7 5]
 [7 5 8]
 [9 7 7]
 [7 6 6]
 [8 5 7]
 [1 1 1]]
  sample 2:
    input     > [9 3 3 0 0 0 0]
    predicted > [[ 9  9  3]
 [ 3  3  9]
 [ 3  6  9]
 [ 1  1  3]
 [-1 -1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [7 9 9 0 0 0 0]
    predicted > [[ 7  8  6]
 [ 9  6  8]
 [ 9  9  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 2400
  minibatch loss: 0.17885658144950867
  sample 1:
    input     > [3 6 8 3 2 0 0 0]
    predicted > [[ 3  6  3]
 [ 6  3  6]
 [ 8  8  8]
 [ 3  3  3]
 [ 2  3  3]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [4 5 4 6 5 5 0 0]
    predicted > [[ 4  4  4]
 [ 5  5  5]
 [ 4  4  4]
 [ 5  6  5]
 [ 6  5  6]
 [ 5  5  4]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [9 7 2 3 3 0 0 0]
    predicted > [[ 9  9  9]
 [ 7  7  7]
 [ 3  2  3]
 [ 2  3  2]
 [ 3  3  6]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 2600
  minibatch loss: 0.20247018337249756
  sample 1:
    input     > [3 7 9 0 0 0 0 0]
    predicted > [[ 3  7  3]
 [ 7  3  5]
 [ 9  9  2]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [2 5 2 5 8 0 0 0]
    predicted > [[ 2  2  2]
 [ 5  5  5]
 [ 2  2  2]
 [ 5  5  5]
 [ 8  5  3]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [5 7 3 7 0 0 0 0]
    predicted > [[ 5  5  7]
 [ 7  7  5]
 [ 3  7  5]
 [ 7  3  3]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 2800
  minibatch loss: 0.24160973727703094
  sample 1:
    input     > [8 2 2 8 4 0 0 0]
    predicted > [[ 8  2  8]
 [ 2  8  2]
 [ 2  8  2]
 [ 8  2  4]
 [ 4  4  8]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [3 3 8 7 0 0 0 0]
    predicted > [[ 3  3  3]
 [ 3  3  8]
 [ 8  7  3]
 [ 7  8  7]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [5 5 8 7 3 7 8 5]
    predicted > [[5 5 5]
 [5 5 5]
 [8 8 8]
 [7 7 7]
 [3 3 7]
 [7 7 3]
 [8 5 8]
 [5 8 5]
 [1 1 1]]

batch 3000
  minibatch loss: 0.23292377591133118
  sample 1:
    input     > [4 4 2 7 0 0 0 0]
    predicted > [[ 4  4  4]
 [ 4  4  2]
 [ 2  7  4]
 [ 7  2  7]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [9 5 3 4 8 7 6 9]
    predicted > [[9 9 9]
 [5 5 5]
 [3 3 8]
 [4 4 3]
 [8 8 4]
 [7 7 6]
 [9 6 7]
 [6 9 9]
 [1 1 1]]
  sample 3:
    input     > [5 5 2 4 2 0 0 0]
    predicted > [[ 5  5  5]
 [ 5  5  5]
 [ 2  4  2]
 [ 4  2  2]
 [ 2  2  4]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 3200
  minibatch loss: 0.13823337852954865
  sample 1:
    input     > [3 3 7 8 2 3 7 2]
    predicted > [[3 3 3]
 [3 3 3]
 [7 8 8]
 [8 7 7]
 [2 2 2]
 [3 3 3]
 [7 7 2]
 [2 2 7]
 [1 1 1]]
  sample 2:
    input     > [6 2 7 3 5 4 7 2]
    predicted > [[6 6 6]
 [2 2 2]
 [7 7 7]
 [3 3 5]
 [4 5 3]
 [5 4 2]
 [7 2 4]
 [2 7 7]
 [1 1 1]]
  sample 3:
    input     > [2 2 7 7 2 0 0 0]
    predicted > [[ 2  2  2]
 [ 2  7  2]
 [ 7  2  7]
 [ 7  2  2]
 [ 2  7  7]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 3400
  minibatch loss: 0.118137888610363
  sample 1:
    input     > [5 5 7 7 6 0 0 0]
    predicted > [[ 5  5  5]
 [ 5  7  7]
 [ 7  5  5]
 [ 7  5  5]
 [ 6  7  6]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [8 2 4 5 0 0 0 0]
    predicted > [[ 8  4  8]
 [ 2  8  2]
 [ 4  2  4]
 [ 5  5  4]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [3 5 2 4 0 0 0 0]
    predicted > [[ 3  3  3]
 [ 5  5  5]
 [ 2  2  2]
 [ 4  5  8]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 3600
  minibatch loss: 0.18091285228729248
  sample 1:
    input     > [9 2 3 2 7 6 6 3]
    predicted > [[9 9 9]
 [2 2 2]
 [3 2 3]
 [2 3 2]
 [6 7 7]
 [7 6 6]
 [6 6 6]
 [3 3 3]
 [1 1 1]]
  sample 2:
    input     > [9 7 6 8 5 3 0 0]
    predicted > [[ 9  9  9]
 [ 7  7  7]
 [ 6  6  9]
 [ 8  8  7]
 [ 5  5  3]
 [ 3  7  5]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [3 4 4 9 8 8 8 5]
    predicted > [[3 3 3]
 [4 4 4]
 [4 4 9]
 [9 8 8]
 [8 9 4]
 [8 5 8]
 [5 8 5]
 [8 8 5]
 [1 1 1]]

batch 3800
  minibatch loss: 0.1578817516565323
  sample 1:
    input     > [4 9 4 5 4 3 9 0]
    predicted > [[ 4  4  4]
 [ 4  9  4]
 [ 9  4  9]
 [ 5  5  5]
 [ 9  4  9]
 [ 8  3  3]
 [ 3  9  4]
 [ 1  1  1]
 [-1 -1 -1]]
  sample 2:
    input     > [8 3 3 4 0 0 0 0]
    predicted > [[ 8  3  3]
 [ 3  8  8]
 [ 3  8  8]
 [ 4  3  4]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [5 4 7 9 5 0 0 0]
    predicted > [[ 5  5  5]
 [ 4  4  7]
 [ 7  5  4]
 [ 9  7  9]
 [ 5  9  4]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 4000
  minibatch loss: 0.21402882039546967
  sample 1:
    input     > [2 4 9 4 4 3 2 0]
    predicted > [[ 4  2  4]
 [ 2  4  2]
 [ 9  9  9]
 [ 4  4  4]
 [ 2  4  2]
 [ 8  3  8]
 [ 4  2  1]
 [ 1  1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [7 8 5 0 0 0 0 0]
    predicted > [[ 7  7  7]
 [ 8  8  8]
 [ 5  8  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [6 6 2 0 0 0 0 0]
    predicted > [[ 6  6  6]
 [ 6  2  6]
 [ 2  6  4]
 [ 1  1  2]
 [-1 -1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 4200
  minibatch loss: 0.07165724784135818
  sample 1:
    input     > [5 8 2 6 0 0 0 0]
    predicted > [[ 5  8  8]
 [ 8  5  5]
 [ 2  6  2]
 [ 6  2  6]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [4 3 9 8 7 3 9 2]
    predicted > [[4 4 4]
 [3 3 3]
 [9 9 9]
 [8 7 8]
 [7 8 7]
 [3 3 3]
 [9 9 2]
 [2 3 9]
 [1 1 1]]
  sample 3:
    input     > [4 2 3 8 2 0 0 0]
    predicted > [[ 4  2  4]
 [ 2  4  3]
 [ 3  8  2]
 [ 8  3  2]
 [ 2  2  8]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 4400
  minibatch loss: 0.08584733307361603
  sample 1:
    input     > [5 6 4 5 2 5 5]
    predicted > [[5 5 5]
 [6 6 6]
 [4 4 4]
 [5 5 5]
 [2 6 2]
 [5 2 5]
 [5 5 6]
 [1 1 1]]
  sample 2:
    input     > [5 5 8 7 9 3 0]
    predicted > [[ 5  5  5]
 [ 5  5  5]
 [ 8  8  8]
 [ 9  7  6]
 [ 7  9  8]
 [ 3  3  7]
 [ 1  1  1]
 [-1 -1 -1]]
  sample 3:
    input     > [9 5 9 2 5 7 3]
    predicted > [[9 9 9]
 [5 5 5]
 [9 9 9]
 [2 2 2]
 [5 7 5]
 [7 5 7]
 [3 3 7]
 [1 1 1]]

batch 4600
  minibatch loss: 0.08049434423446655
  sample 1:
    input     > [3 9 4 0 0 0 0 0]
    predicted > [[ 3  3  9]
 [ 9  4  3]
 [ 4  9  4]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [7 8 9 4 3 5 2 3]
    predicted > [[7 7 8]
 [8 8 7]
 [9 9 9]
 [4 4 4]
 [3 3 3]
 [5 7 5]
 [2 9 2]
 [3 3 3]
 [1 1 1]]
  sample 3:
    input     > [9 3 6 4 4 6 5 9]
    predicted > [[9 9 9]
 [3 3 3]
 [6 4 6]
 [4 6 4]
 [4 6 4]
 [6 5 5]
 [5 4 6]
 [9 9 6]
 [1 1 1]]

batch 4800
  minibatch loss: 0.037724826484918594
  sample 1:
    input     > [5 6 8 2 5 0 0]
    predicted > [[ 5  6  6]
 [ 6  5  5]
 [ 8  8  8]
 [ 2  2  5]
 [ 5  5  2]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [7 9 5 0 0 0 0]
    predicted > [[ 7  5  5]
 [ 9  7  2]
 [ 5  9  7]
 [ 1  1  9]
 [-1 -1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [5 3 3 6 0 0 0]
    predicted > [[ 5  3  3]
 [ 3  5  5]
 [ 3  5  3]
 [ 6  3  5]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 5000
  minibatch loss: 0.12354864180088043
  sample 1:
    input     > [4 6 9 6 5 7 8 9]
    predicted > [[4 4 4]
 [6 6 6]
 [9 9 6]
 [6 6 9]
 [5 5 5]
 [7 7 8]
 [9 8 6]
 [8 9 7]
 [1 1 1]]
  sample 2:
    input     > [6 5 9 9 8 0 0 0]
    predicted > [[ 6  6  6]
 [ 5  5  9]
 [ 9  9  5]
 [ 9  8  8]
 [ 8  9  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [6 7 2 8 9 7 0 0]
    predicted > [[ 6  6  6]
 [ 7  7  7]
 [ 2  2  2]
 [ 8  8  8]
 [ 9  6  7]
 [ 7  8  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 5200
  minibatch loss: 0.05009409785270691
  sample 1:
    input     > [6 3 8 7 0 0 0]
    predicted > [[ 6  6  6]
 [ 3  8  8]
 [ 8  3  3]
 [ 7  7  6]
 [ 1  1  7]
 [-1 -1  1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [9 2 5 9 0 0 0]
    predicted > [[ 9  2  9]
 [ 2  9  5]
 [ 5  5  2]
 [ 9  9  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 3:
    input     > [3 5 2 7 7 6 0]
    predicted > [[ 3  5  3]
 [ 5  3  5]
 [ 2  2  7]
 [ 7  7  2]
 [ 7  6  2]
 [ 6  7  5]
 [ 1  1  1]
 [-1 -1 -1]]

batch 5400
  minibatch loss: 0.09247519075870514
  sample 1:
    input     > [8 6 5 3 8 7 4 2]
    predicted > [[8 8 8]
 [6 6 6]
 [5 5 5]
 [3 8 8]
 [8 3 3]
 [7 7 2]
 [4 2 7]
 [2 4 4]
 [1 1 1]]
  sample 2:
    input     > [8 6 3 9 4 7 5 0]
    predicted > [[ 8  3  3]
 [ 6  8  8]
 [ 3  9  9]
 [ 9  6  6]
 [ 4  4  7]
 [ 7  7  4]
 [ 5  7  4]
 [ 1  1  1]
 [-1 -1 -1]]
  sample 3:
    input     > [5 3 8 8 6 6 3 0]
    predicted > [[ 5  5  5]
 [ 8  3  8]
 [ 3  8  3]
 [ 6  8  6]
 [ 8  6  3]
 [ 3  6  8]
 [ 6  3  9]
 [ 1  1  1]
 [-1 -1 -1]]

batch 5600
  minibatch loss: 0.05249354988336563
  sample 1:
    input     > [5 9 5 0 0 0 0 0]
    predicted > [[ 5  5  5]
 [ 9  9  5]
 [ 5  8  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [9 6 6 7 3 6 5 6]
    predicted > [[ 9  9  9]
 [ 6  6  6]
 [ 6  6  6]
 [ 7  3  7]
 [ 3  7  3]
 [ 6  6  6]
 [ 5  5  5]
 [ 6  6  1]
 [ 1  1 -1]]
  sample 3:
    input     > [6 3 9 5 9 9 3 0]
    predicted > [[ 6  3  3]
 [ 3  6  6]
 [ 9  9  9]
 [ 5  5  5]
 [ 9  9  9]
 [ 9  9  9]
 [ 3  3  8]
 [ 1  1  1]
 [-1 -1 -1]]

batch 5800
  minibatch loss: 0.08289551734924316
  sample 1:
    input     > [8 9 3 5 2 0 0 0]
    predicted > [[ 8  8  9]
 [ 9  9  8]
 [ 3  5  3]
 [ 5  3  5]
 [ 2  2  2]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [7 4 8 3 2 3 4 9]
    predicted > [[ 7  7  7]
 [ 4  4  4]
 [ 8  8  3]
 [ 3  3  8]
 [ 2  2  2]
 [ 3  3  4]
 [ 4  9  3]
 [ 9  4  9]
 [ 1  1  6]
 [-1 -1  1]]
  sample 3:
    input     > [2 3 6 6 0 0 0 0]
    predicted > [[ 2  2  3]
 [ 3  3  2]
 [ 6  6  6]
 [ 6  9  6]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]

batch 6000
  minibatch loss: 0.03706203028559685
  sample 1:
    input     > [3 5 3 9 0 0 0 0]
    predicted > [[ 3  3  5]
 [ 5  3  3]
 [ 3  5  3]
 [ 9  9  9]
 [ 1  1  1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
  sample 2:
    input     > [8 2 9 6 2 8 2 3]
    predicted > [[8 8 8]
 [2 2 9]
 [9 9 2]
 [6 6 2]
 [2 2 6]
 [8 8 8]
 [3 2 3]
 [2 3 9]
 [1 1 1]]
  sample 3:
    input     > [6 4 8 4 9 7 4 0]
    predicted > [[ 6  6  6]
 [ 4  4  4]
 [ 8  8  4]
 [ 4  9  8]
 [ 9  4  8]
 [ 7  7  9]
 [ 4  4  6]
 [ 1  1  1]
 [-1 -1 -1]]

%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(loss_track)
print('loss {:.4f} after {} examples (batch_size={})'.format(loss_track[-1], 
                                                             len(loss_track)*batch_size, batch_size))

loss 0.0375 after 60010 examples (batch_size=10)

[图片上传失败...(image-2e6574-1544602821556)]

（八）sequence to sequence —5
实现多层双向的dynamic_lstm+beam_search 基于tensorflow1.4 Seq2seq的实...
（八）sequence to sequence —4
实现双向的dynamic_lstm+beam_search 基于tensorflow1.4 Seq2seq的实现 ...
（八）sequence to sequence —6
最后一关： Encoder：多层双向lstm Attention机制 decoder：动态实现bi-directi...
（八）sequence to sequence —1
这个系列网上的教程实在太多，所以我准备采用代码和理论相结合的方式，详细代码请点击我的github，基于python...
（八）sequence to sequence —2
实现softmax_loss_function部分基于tensorflow1.4 Seq2seq的实现 1.使用...
（八）sequence to sequence —3
实现beam_search部分基于tensorflow1.4 Seq2seq的实现 1.使用seq2seq库实现...
有趣的斐波那契数列
Math day 5:Fibonacci Sequence Fibonacci Sequence and Musi...
5.The Child and Sequence
5.The Child and Sequence
Sequence to Sequence学习资料
Sequence to Sequence学习资料 seq2seq学习笔记 - CSDN博客深度学习方法（八）：自...
Swift 4.1 - SE-0187 Introduce Se
提供 Sequence.compactMap(_:) 替换 Sequence.flatMap Sequence.f...