LSTM in TensorFlow

作者: 40ac75f41fa3 | 来源:发表于2016-04-13 16:00 被阅读4339次

    LSTM

    参数解析:

    • init_scale - the initial scale of the weights
    • learning_rate - the initial value of the learning rate
    • max_grad_norm - the maximum permissible norm of the gradient
    • num_layers - the number of LSTM layers
    • num_steps - the number of unrolled steps of LSTM 这个指的就是time_step,也就是输入的词的个数
    • hidden_size - the number of LSTM units 每一层lstm有多少个小单元
    • max_epoch - the number of epochs trained with the initial learning rate
    • max_max_epoch - the total number of epochs for training
    • keep_prob - the probability of keeping weights in the dropout layer
    • lr_decay - the decay of the learning rate for each epoch after "max_epoch"
    • batch_size - the batch size

    LSTM的输入

    将embedding和input进行映射,使用embedding_lookup,每次输入的是[size_batches, seq_length, rnn_size],三个参数分别是:时间长度,batch的size,rnn中的unit个数
    之所以这样拆的原因是:
    为使学习过程易于处理,通常的做法是将反向传播的梯度在(按时间)展开的步骤上照一个固定长度(seq_length)截断。 通过在一次迭代中的每个时刻上提供长度为 size_batch 的输入和每次迭代完成之后反向传导,这会很容易实现。

    输入的变化:x_data = [446,50,50],指的是[number_batch,size_batch,seq_length] ==>embedding = [65,128],指的是[vocab_size,rnn_size] ==> Input_data=[50,50],指的是[batch_size,seq_length] ==> embedding_lookup(embedding,input_data) ==> [50,50,128],指的是[batch_size,seq_length,rnn_size]

    LSTM初始化声明

    lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(size, forget_bias=0.0)
    #size 指的就是hidden_size
    
    lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=config.keep_prob) # dropout的声明
    cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_layers) #多层RNN的声明方式
    

    LSTM的输入

    用一个word2vec表示每个词语,输入的矩阵会被随机初始化,然后随着模型的学习,来不断修改

    # embedding_matrix 张量的形状是: [vocabulary_size, embedding_size]
    word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)
    

    LSTM训练过程

    state = self._initial_state
        with tf.variable_scope("RNN"):
          for time_step in range(num_steps):
            if time_step > 0: tf.get_variable_scope().reuse_variables()
            (cell_output, state) = cell(inputs[:, time_step, :], state)
            outputs.append(cell_output)
    

    LSTM误差声明

    output = tf.reshape(tf.concat(1, outputs), [-1, size]) #
        softmax_w = tf.get_variable("softmax_w", [size, vocab_size])
        softmax_b = tf.get_variable("softmax_b", [vocab_size])
        logits = tf.matmul(output, softmax_w) + softmax_b
        loss = tf.nn.seq2seq.sequence_loss_by_example(
            [logits],
            [tf.reshape(self._targets, [-1])],
            [tf.ones([batch_size * num_steps])])
        self._cost = cost = tf.reduce_sum(loss) / batch_size
    
    

    LSTM的迭代过程

    
    for step, (x, y) in enumerate(reader.ptb_iterator(data, m.batch_size,
                                                        m.num_steps)):
        cost, state, _ = session.run([m.cost, m.final_state, eval_op],
                                     {m.input_data: x,
                                      m.targets: y,
                                      m.initial_state: state})
        costs += cost
        iters += m.num_steps
    
        if verbose and step % (epoch_size // 10) == 10:
          print("%.3f perplexity: %.3f speed: %.0f wps" %
                (step * 1.0 / epoch_size, np.exp(costs / iters),
                 iters * m.batch_size / (time.time() - start_time)))
    
      return np.exp(costs / iters)
      # 此处需要针对cost,final_state,eval_op三个结构进行求解,输入三个参数如下,input_data,target,initial_state
    

    返回的总误差是$$Loss = -\frac{1}{N}\sum_{i=1}^N InP_{target_i}$$
    $$TotalLoss = e^{Loss}$$

      for i in range(config.max_max_epoch):
          lr_decay = config.lr_decay ** max(i - config.max_epoch, 0.0)
          m.assign_lr(session, config.learning_rate * lr_decay)
    
          print("Epoch: %d Learning rate: %.3f" % (i + 1, session.run(m.lr)))
          train_perplexity = run_epoch(session, m, train_data, m.train_op,
                                       verbose=True)
    

    先完成model的初始化,然后在针对loss,train_op进行优化求解,通过SGD等方式进行求解


    RNN-LSTM 参数设置

    import argparse
    parser = argparse.AugmentParser()
    #添加参数名称,类型,缺省值,帮助提示
    parser.add_argument('--batch_size',type=int,defaule = 50, help='mini batch size')
    parser.add_argument('--learn_rate', type = float, default = 0.01, help = 'learn rate.')
    parser.add_argument('--')
    

    save and restore

    model = Model(saved_args, True)
    saver = tf.train.Saver(tf.all_variables())
    with tf.Session() as sess:
            #tf.initialize_all_variables().run()
    
            sess.run(tf.initialize_all_variables())
            saver = tf.train.Saver(tf.all_variables())# save all variables 
            ckpt = tf.train.get_checkpoint_state(args.save_dir)
            if ckpt and ckpt.model_checkpoint_path:
                saver.restore(sess, ckpt.model_checkpoint_path)# restore the sess from ckpt.model_checkpoint_path
                print model.sample(sess, chars, vocab, args.n, args.prime)
    
    

    tf.assign()

    相关文章

      网友评论

      • 659ef9f8fcd5:楼主,请问一个问题,如果输入是691维的向量,输出是109维的向量,可以用lstm建模么?
      • 安兴乐:楼主,输入的变化.....到LSTM初始化 中间的这一部分排版有些乱,因为排版的原因所以看不懂。

      本文标题:LSTM in TensorFlow

      本文链接:https://www.haomeiwen.com/subject/haawlttx.html