美文网首页心理散文简友广场
机器学习:3.4 神经网络 Neural Network

机器学习:3.4 神经网络 Neural Network

作者: Cache_wood | 来源:发表于2022-04-14 21:00 被阅读0次

@[toc]

Handcrafted Features \rightarrow Learned Features

  • NN usually requires more data and more computation

  • NN architectures to model data structures

    • Multilayer perceptions

    • Convolutional neural networks

    • Recurrent neural networks

    • Attention mechanism

  • Design NN to incorporate prior knowledge about the data

Linear Methods \rightarrow Multilayer Perceptron (MLP)

  • A dense (fully connected, or linear) layer has parameters W \in R^{m\times n},b \in R^m, it computes output y = Wx+b\in R^m
  • Linear regression: dense layer with 1 output
  • Softmax regression: dense layer with outputs + softmax

Multilayer Perceptron (MLP)

  • Activation is a elemental-wise non-linear function

    • sigmoid(x) = \frac{1}{1+e^{-x}},ReLU(x) = max(x,0)

    • It leads to non-linear models

  • Stack multiple hidden layers (dense + activation) to get deeper models

  • Hyper-parameters: hidden layers, # outputs of each hidden layer

  • Universal approximation theorem

Code

  • MLP with 1 hidden layer
  • Hyperparameter: num_hiddens

Dense layer → Convolution layer

  • Learn ImageNet (300x300 images with 1K classes) by a MLP with a single hidden layer with 10K outputs

    • It leads to 1 billion learnable parameters, that’s too big!

    • Fully connected: an output is a weighted sum over all inputs

  • Recognize objects in images

    • Translation invariance: similar output no matter where the object is

    • Locality: pixels are more related to near neighbors

  • Build the prior knowledge into the model structure

  • Achieve same model capacity with less # params

Convolution layer

  • Locality: an output is computed from k\times k input windows
  • Translation invariant: outputs use the same k\times kweights (kernel)
  • model params of a conv layer does not depend on input/output sizes n\times m \rightarrow k\times k
  • A kernel may learn to identify a pattern

Pooling Layer

  • Convolution is sensitive to location
    • A translation/rotation of a pattern in the input results similar changes of a pattern in the output
  • A pooling layer computes mean/max in windows of size k × k

Convolutional Neural Networks (CNN)

  • Stacking convolution layers to extract features

    • Activation is applied after each convolution layer

    • Using pooling to reduce location sensitivity

  • Modern CNNs are deep neural network with various hyper-parameters and layer connections (AlexNet, VGG, Inceptions, ResNet, MobileNet)

RNN and Gated RNN

  • Simple RNN: h_t = \phi(W_{hh}h_{t-1}+W_{hx}x_t + b_h)

  • Gated RNN (LSTM, GRU): finer control of information flow

    • Forget input: suppress x_t when computing h_t

    • Forget past: suppress h_{t-1} when computing h_t

Summary

  • MLP: stack dense layers with non-linear activations
  • CNN: stack convolution activation and pooling layers to efficient
    extract spatial information
  • RNN: stack recurrent layers to pass temporal information
    through hidden state

相关文章

网友评论

    本文标题:机器学习:3.4 神经网络 Neural Network

    本文链接:https://www.haomeiwen.com/subject/tieosrtx.html