1

1.1 Notations

Notation	Description
x^(i)<t>	The t-th element of the i-th training example input sequence
y^(i)<t>	The t-th element of the i-th training example output sequence
T_x⁽ⁱ⁾	The length of the i-th training input sequence
T_y⁽ⁱ⁾	The length of the i-th training output sequence

NLP: Natural Language Processing

One-hot representation: A (column) vector with zeros in it except the corresponding bit of the word to the vocabulary. This vector is called a one-hot vector.

UNK: Unknown word, representing words that are not in your vocabulary.

1.2 Recurrent Neural Network

Limitation for RNN: The prediction at a certain time uses information earlier in the sequence but not information later in the sequence.

Forward Propagation

We could compress W_aa and W_ax into a matrix W_a, and stack a^<t-1> over x^<t>. Then we could simplify the expression.

So the simpler version of forward propagation is:

[missing figure]

Backward Propagation

Use cross-entropy to define the loss function element-wise, and then the cost function is the sum of the losses calculated by each yhat^<t> and y^<t>.

1.3 Different Architectures for RNN

Many-to-many (T_x = T_y)
Many-to-one: Read through the sequence and output one value
One-to-many: Read the single input and keep running with just the activations as inputs
Many-to-many (T_x != T_y): After reading through the sequence, start outputting with only activations as inputs