Handcrafted Features
Learned Features
NN usually requires more data and more computation
NN architectures to model data structures
Multilayer perceptions
Convolutional neural networks
Recurrent neural networks
Attention mechanism
Design NN to incorporate prior knowledge about the data
Linear Methods
Multilayer Perceptron (MLP)
- A dense (fully connected, or linear) layer has parameters
, it computes output
- Linear regression: dense layer with 1 output
- Softmax regression: dense layer with outputs + softmax
Multilayer Perceptron (MLP)
Activation is a elemental-wise non-linear function
It leads to non-linear models
Stack multiple hidden layers (dense + activation) to get deeper models
Hyper-parameters: hidden layers, # outputs of each hidden layer
Universal approximation theorem
- MLP with 1 hidden layer
- Hyperparameter: num_hiddens
Dense layer → Convolution layer
Learn ImageNet (300x300 images with 1K classes) by a MLP with a single hidden layer with 10K outputs
It leads to 1 billion learnable parameters, that’s too big!
Fully connected: an output is a weighted sum over all inputs
Recognize objects in images
Translation invariance: similar output no matter where the object is
Locality: pixels are more related to near neighbors
Build the prior knowledge into the model structure
Achieve same model capacity with less # params
Convolution layer
- Locality: an output is computed from
input windows
- Translation invariant: outputs use the same
weights (kernel)
- model params of a conv layer does not depend on input/output sizes
- A kernel may learn to identify a pattern
Pooling Layer
- Convolution is sensitive to location
- A translation/rotation of a pattern in the input results similar changes of a pattern in the output
- A pooling layer computes mean/max in windows of size k × k
Convolutional Neural Networks (CNN)
Stacking convolution layers to extract features
Activation is applied after each convolution layer
Using pooling to reduce location sensitivity
Modern CNNs are deep neural network with various hyper-parameters and layer connections (AlexNet, VGG, Inceptions, ResNet, MobileNet)
RNN and Gated RNN
Simple RNN:
Gated RNN (LSTM, GRU): finer control of information flow
Forget input: suppress
when computing
Forget past: suppress
when computing
- MLP: stack dense layers with non-linear activations
- CNN: stack convolution activation and pooling layers to efficient
extract spatial information - RNN: stack recurrent layers to pass temporal information
through hidden state