@[toc]
Handcrafted Features
Learned Features
-
NN usually requires more data and more computation
-
NN architectures to model data structures
-
Multilayer perceptions
-
Convolutional neural networks
-
Recurrent neural networks
-
Attention mechanism
-
-
Design NN to incorporate prior knowledge about the data
Linear Methods
Multilayer Perceptron (MLP)
- A dense (fully connected, or linear) layer has parameters
, it computes output
- Linear regression: dense layer with 1 output
- Softmax regression: dense layer with outputs + softmax
Multilayer Perceptron (MLP)
-
Activation is a elemental-wise non-linear function
-
-
It leads to non-linear models
-
-
Stack multiple hidden layers (dense + activation) to get deeper models
-
Hyper-parameters: hidden layers, # outputs of each hidden layer
-
Universal approximation theorem
Code
- MLP with 1 hidden layer
- Hyperparameter: num_hiddens
Dense layer → Convolution layer
-
Learn ImageNet (300x300 images with 1K classes) by a MLP with a single hidden layer with 10K outputs
-
It leads to 1 billion learnable parameters, that’s too big!
-
Fully connected: an output is a weighted sum over all inputs
-
-
Recognize objects in images
-
Translation invariance: similar output no matter where the object is
-
Locality: pixels are more related to near neighbors
-
-
Build the prior knowledge into the model structure
-
Achieve same model capacity with less # params
Convolution layer
- Locality: an output is computed from
input windows
- Translation invariant: outputs use the same
weights (kernel)
- model params of a conv layer does not depend on input/output sizes
- A kernel may learn to identify a pattern
Pooling Layer
- Convolution is sensitive to location
- A translation/rotation of a pattern in the input results similar changes of a pattern in the output
- A pooling layer computes mean/max in windows of size k × k
Convolutional Neural Networks (CNN)
-
Stacking convolution layers to extract features
-
Activation is applied after each convolution layer
-
Using pooling to reduce location sensitivity
-
-
Modern CNNs are deep neural network with various hyper-parameters and layer connections (AlexNet, VGG, Inceptions, ResNet, MobileNet)
RNN and Gated RNN
-
Simple RNN:
-
Gated RNN (LSTM, GRU): finer control of information flow
-
Forget input: suppress
when computing
-
Forget past: suppress
when computing
-
Summary
- MLP: stack dense layers with non-linear activations
- CNN: stack convolution activation and pooling layers to efficient
extract spatial information - RNN: stack recurrent layers to pass temporal information
through hidden state
网友评论