美文网首页
RNN的输入输出

RNN的输入输出

作者: 美环花子若野 | 来源:发表于2018-06-06 10:57 被阅读665次

    Let’s say you have a batch of two examples, one is of length 13, and the other of length 20. Each one is a vector of 128 numbers. The length 13 example is 0-padded to length 20. Then your RNN input tensor is of shape [2, 20, 128]. The dynamic_rnn function returns a tuple of (outputs, state), where outputs is a tensor of size [2, 20, ...] with the last dimension being the RNN output at each time step. state is the last state for each example, and it’s a tensor of size [2, ...] where the last dimension also depends on what kind of RNN cell you’re using.

    http://www.wildml.com/2016/08/rnns-in-tensorflow-a-practical-guide-and-undocumented-features/

    So, here’s the problem: Once your reach time step 13, your first example in the batch is already “done” and you don’t want to perform any additional calculation on it. The second example isn’t and must go through the RNN until step 20. By passing sequence_length=[13,20] you tell Tensorflow to stop calculations for example 1 at step 13 and simply copy the state from time step 13 to the end. The output will be set to 0 for all time steps past 13. You’ve just saved some computational cost. But more importantly, if you didn’t pass sequence_length you would get incorrect results! Without passing sequence_length, Tensorflow will continue calculating the state until T=20 instead of simply copying the state from T=13. This means you would calculate the state using the padded elements, which is not what you want.http://www.wildml.com/2016/08/rnns-in-tensorflow-a-practical-guide-and-undocumented-features/

    # Create input data

    X =np.random.randn(2, 10, 8)

    # The second example is of length 6

    X[1,6:] =0

    X_lengths =[10, 6]

    cell =tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)

    outputs, last_states =tf.nn.dynamic_rnn(

        cell=cell,

        dtype=tf.float64,

        sequence_length=X_lengths,

        inputs=X)

    result =tf.contrib.learn.run_n(

        {"outputs": outputs, "last_states": last_states},

        n=1,

        feed_dict=None)

    assertresult[0]["outputs"].shape ==(2, 10, 64)

    # Outputs for the second example past past length 6 should be 0

    assert(result[0]["outputs"][1,7,:] ==np.zeros(cell.output_size)).all()

    Bidirectional RNNs

    When using a standard RNN to make predictions we are only taking the “past” into account. For certain tasks this makes sense (e.g. predicting the next word), but for some tasks it would be useful to take both the past and the future into account. Think of a tagging task, like part-of-speech tagging, where we want to assign a tag to each word in a sentence. Here we already know the full sequence of words, and for each word we want to take not only the words to the left (past) but also the words to the right (future) into account when making a prediction. Bidirectional RNNs do exactly that. A bidirectional RNN is a combination of two RNNs – one runs forward from “left to right” and one runs backward from “right to left”. These are commonly used for tagging tasks, or when we want to embed a sequence into a fixed-length vector (beyond the scope of this post).

    Just like for standard RNNs, Tensorflow has static and dynamic versions of the bidirectional RNN. As of the time of this writing, the bidirectional_dynamic_rnn is still undocumented, but it’s preferred over the static bidirectional_rnn.

    The key differences of the bidirectional RNN functions are that they take a separate cell argument for both the forward and backward RNN, and that they return separate outputs and states for both the forward and backward RNN:

    X =np.random.randn(2, 10, 8)

    X[1,6,:] =0

    X_lengths =[10, 6]

    cell =tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)

    outputs, states  =tf.nn.bidirectional_dynamic_rnn(

        cell_fw=cell,

        cell_bw=cell,

        dtype=tf.float64,

        sequence_length=X_lengths,

        inputs=X)

    output_fw, output_bw =outputs

    states_fw, states_bw =states

    RNN Cells, Wrappers and Multi-Layer RNNs

    Check out the Jupyter Notebook on RNN Cells here!

    All Tensorflow RNN functions take a cell argument. LSTMs and GRUs are the most commonly used cells, but there are many others, and not all of them are documented. Currently, the best way to get a sense of what cells are available is to look at at rnn_cell.py and contrib/rnn_cell.

    As of the time of this writing, the basic RNN cells and wrappers are:

    BasicRNNCell – A vanilla RNN cell.

    GRUCell – A Gated Recurrent Unit cell.

    BasicLSTMCell – An LSTM cell based on Recurrent Neural Network Regularization. No peephole connection or cell clipping.

    LSTMCell – A more complex LSTM cell that allows for optional peephole connections and cell clipping.

    MultiRNNCell – A wrapper to combine multiple cells into a multi-layer cell.

    DropoutWrapper – A wrapper to add dropout to input and/or output connections of a cell.

    and the contributed RNN cells and wrappers:

    CoupledInputForgetGateLSTMCell – An extended LSTMCell that has coupled input and forget gates based on LSTM: A Search Space Odyssey.

    TimeFreqLSTMCell – Time-Frequency LSTM cell based on Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks

    GridLSTMCell– The cell from Grid Long Short-Term Memory.

    AttentionCellWrapper – Adds attention to an existing RNN cell, based on Long Short-Term Memory-Networks for Machine Reading.

    LSTMBlockCell – A faster version of the basic LSTM cell (Note: this one is in lstm_ops.py)

    Using these wrappers and cells is simple, e.g.

    cell =tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)

    cell =tf.nn.rnn_cell.DropoutWrapper(cell=cell, output_keep_prob=0.5)

    cell =tf.nn.rnn_cell.MultiRNNCell(cells=[cell] *4, state_is_tuple=True)

    Calculating sequence loss on padded examples

    Check out the Jupyter Notebook on Calculating Loss here!

    For sequence prediction tasks we often want to make a prediction at each time step. For example, in Language Modeling we try to predict the next word for each word in a sentence. If all of your sequences are of the same length you can use Tensorflow’s sequence_loss and sequence_loss_by_example functions (undocumented) to calculate the standard cross-entropy loss.

    However, as of the time of this writing sequence_loss does not support variable-length sequences (like the ones you get from a dynamic_rnn). Naively calculating the loss at each time step doesn’t work because that would take into account the padded positions. The solution is to create a weight matrix that “masks out” the losses at padded positions.

    Here you can see why 0-padding can be a problem when you also have a “0-class”. If that’s the case you cannotuse tf.sign(tf.to_float(y)) to create a mask because that would mask out the “0-class” as well. You can still create a mask using the sequence length information, it just becomes more complicated.

    # Batch size

    B =4

    # (Maximum) number of time steps in this batch

    T =8

    RNN_DIM =128

    NUM_CLASSES =10

    # The *acutal* length of the examples

    example_len =[1, 2, 3, 8]

    # The classes of the examples at each step (between 1 and 9, 0 means padding)

    y =np.random.randint(1, 10, [B, T])

    fori, length inenumerate(example_len):

        y[i, length:] =0

    # The RNN outputs

    rnn_outputs =tf.convert_to_tensor(np.random.randn(B, T, RNN_DIM), dtype=tf.float32)

    # Output layer weights

    W =tf.get_variable(

        name="W",

        initializer=tf.random_normal_initializer(),

        shape=[RNN_DIM, NUM_CLASSES])

    # Calculate logits and probs

    # Reshape so we can calculate them all at once

    rnn_outputs_flat =tf.reshape(rnn_outputs, [-1, RNN_DIM])

    logits_flat =tf.batch_matmul(rnn_outputs_flat, W)

    probs_flat =tf.nn.softmax(logits_flat)

    # Calculate the losses

    y_flat =tf.reshape(y, [-1])

    losses =tf.nn.sparse_softmax_cross_entropy_with_logits(logits_flat, y_flat)

    # Mask the losses

    mask =tf.sign(tf.to_float(y_flat))

    masked_losses =mask *losses

    # Bring back to [B, T] shape

    masked_losses =tf.reshape(masked_losses,  tf.shape(y))

    # Calculate mean loss

    mean_loss_by_example =tf.reduce_sum(masked_losses, reduction_indices=1) /example_len

    mean_loss =tf.reduce_mean(mean_loss_by_example)

    TwitterLinkedInFacebookPocketGoogle+分享

    CATEGORIESLANGUAGE MODELINGNEURAL NETWORKSRECURRENT NEURAL NETWORKSTENSORFLOW

    Post navigation

    Previous PostPREVIOUSDeep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow

    Next PostNEXTLearning Reinforcement Learning (with Code, Exercises and Solutions)

    CONNECT

    TwitterLinkedIn

    SUBSCRIBE TO BLOG VIA EMAIL

    Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Email Address

    RECENT POSTS

    Introduction to Learning to Trade with Reinforcement Learning

    AI and Deep Learning in 2017 – A Year in Review

    Hype or Not? Some Perspective on OpenAI’s DotA 2 Bot

    Learning Reinforcement Learning (with Code, Exercises and Solutions)

    RNNs in Tensorflow, a Practical Guide and Undocumented Features

    Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow

    Deep Learning for Chatbots, Part 1 – Introduction

    Attention and Memory in Deep Learning and NLP

    ARCHIVES

    February 2018

    December 2017

    August 2017

    October 2016

    August 2016

    July 2016

    April 2016

    January 2016

    December 2015

    November 2015

    October 2015

    September 2015

    CATEGORIES

    Conversational Agents

    Convolutional Neural Networks

    Deep Learning

    GPU

    Language Modeling

    Memory

    Neural Networks

    News

    NLP

    Recurrent Neural Networks

    Reinforcement Learning

    RNNs

    Tensorflow

    Trading

    Uncategorized

    META

    Log in

    Entries RSS

    Comments RSS

    相关文章

      网友评论

          本文标题:RNN的输入输出

          本文链接:https://www.haomeiwen.com/subject/nyfbsftx.html