9. Neural Networks: Learing

作者: 玄语梨落 | 来源:发表于2020-08-19 21:55 被阅读0次

9. Neural Networks: Learing
Graph Neural Networks: A Review
05组——Non-local Neural Networks
LSTM
Binarized Neural Networks
Understanding CNN
neural networks
Neural Networks
Neural networks
net-compress_ICML(18')

Neural Networks: Learing

Cost Function

L = total number of layers in network
$s_j$ = no. of units (not counting bias unit) in layer $l$ . K = $s_L$

Binary classification (K = 1)
Multi-class classification (K class)

$J(\Theta) = -\frac{1}{m}[\sum_{i=1}^m\sum_{k=1}^Ky_k^{(i)}log(h_\Theta(x^{(i)}))_k+(1-y_k^{(i)})log(1-(h_\Theta(x^{(i)}))_k)]+\\ \frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_l+1}(\Theta_{ji}^{(l)})^2 \newline h_\Theta(x)\in R^k \qquad (h_\Theta(x))_i=i^{th}$

Backpropagation algorithm

$\delta_j^{(l)}$ = "error" of node j in layer l.

For each output unit (layer L = 4)

$\delta_j^{(4)} = a_j^{(4)} - y_j$
$\delta^{(4)} = a^{(4)}-y$
$\delta^{(3)} = (\Theta^{(3)})^T\delta^{(4)}.*g'(z^{(3)})$ ( $g'(z^{(2)}) = a^{(3)}.*(1-a^{(3)})$ )
$\delta^{(2)} = (\Theta^{(2)})^T\delta^{(3)}.*g'(z^{(2)})$

Backpropagation algorithm

$\Delta_{ij}^{(l)} := \Delta_{ij}^{(l)}+a_j^{(l)}\delta_i^{(l+1)}$

Vectorized implementation:

$\Delta^{(l)} := \Delta^{(l)}+\delta^{(l+1)}(a^{(l)})^T$

$D_{ij}^{(l)} = \Delta_{ij}^{(l)}+\lambda\Theta_{ij}^{(l)} \ (j\ne 0)$
$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta) = D_{ij}^{(l)}$

Backpropagation intuition

Forward Propagation

Understand what Backpropagation does.

Implementation note: Unrolling parameters

function [jVal, gradient] = costFunction(theta)

The paramters 'theta' and 'gradient' must be a vector.However, in Neural Network, paramter 'theta' is a matrix. So we must find a way to unroll the matrix.

$s_1=10,s_2=10,s_3=1 \newline \Theta^{(1)}\in R^{10\times11},\Theta^{(2)}\in R^{10\times11},\Theta^{(3)}\in R^{1\times11} \newline D^{(1)}\in R^{10\times11}D^{(2)}\in R^{10\times11},D^{(3)}\in R^{1\times11}$

thetaVec = [Theta1(:);Theta2(:);Theta3(:)];
DVec = [D1(:);D2(:);D3(:)];
Theta1 = reshape(thetaVec(1:110),10,11);
Theta2 = reshape(thetaVec(111:220),10,11);
Theta3 = reshape(thetaVec(221:231),1,11);

Gradient checking

to make sure that the backpropagation and the forward propagation are correct.

one side difference
two side difference

Implement: gradApprox = (J(theta+EPSILON)-J(theta-EPSILON))/(2*EPSILON)

Parameter vector $\theta$

$\frac{\partial}{\partial\theta_1}J(\theta)\approx\frac{J(\theta_1+\epsilon,\theta_2,\theta_3,...,\theta_n)-J(\theta_1-\epsilon,\theta_2,\theta_3,...,\theta_n)}{2\epsilon} \newline ...$

Implementation Note:

Implement backprop to compute DVec.
Implement numerical gradient check to compute gradApprox.
Make sure they give similar values.
Turn off gradient checking. Using backprop code for learing.
Be sure to disable your gradient checking code before training your classifier.

Random initialization

If use 'zero initialization', after each update, parameters corresponding to inputs going into each of two hidden units are identical.

Random initialization: Symmetry breaking

Initialize each $\Theta_{ij}^{(l)}$ to a random value in [- $\epsilon$ , $\epsilon$ ]

Put it together

training a neural network

Pick a network architechture (connectivity pattern between neurons)

No. of input units: Dimension of features $x^{(i)}$
No. of output units: Nmuber of classes
Reasonable default: 1 hidden hayer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better)

Randomly initialize weights
Implemnet forward propagation to get $h_\Theta(x^{(i)})$ for any $x^{(i)}$
Implement code to compute function $J(\Theta)$
Implement backprop to compute partial derivatives $\frac{\partial}{\partial\Theta_{jk}^{(l)}}J(\Theta)$
Use gradient checking to compare $\frac{\partial}{\partial\Theta_{jk}^{(l)}}J(\Theta)$ computed using backpropagation vs. using numerical estimate of gradient of $J(\Theta)$ . Then disable gradient checking code.
Use gradient descent or advanced optimization menthod with backpropagation to try to minimize $J(\Theta)$ as a function of parameters $\Theta$

Autonomous driviong example

9. Neural Networks: Learing
Neural Networks: Learing Cost Function L = total number o...
Graph Neural Networks: A Review
Graph Neural Networks Graph Neural Networks: A Review of ...
05组——Non-local Neural Networks
Non-local Neural Networks Non-local Neural Networks是何凯明大佬...
LSTM
Recurrent Neural Networks networks with loops in them, al...
Binarized Neural Networks
Approach Experiment References:Binarized Neural Networks:...
Understanding CNN
An Intuitive Explanation of Convolutional Neural Networks...
neural networks
1.Representation activation of unit i in layer j:matrix o...
Neural Networks
神级网络算法： 1. Multilayer Feed-Forward Neural Network Includi...
Neural networks
Layer input layer output layer hidden layer is "activatio...
net-compress_ICML(18')
Pruning Compressing Neural Networks using the Variational...

9. Neural Networks: Learing

Neural Networks: Learing

Cost Function

Backpropagation algorithm

Backpropagation algorithm

Backpropagation intuition

Implementation note: Unrolling parameters

Gradient checking

Parameter vector $\theta$

Random initialization

Put it together

training a neural network

Autonomous driviong example

相关文章

9. Neural Networks: Learing

Graph Neural Networks: A Review

05组——Non-local Neural Networks

LSTM

Binarized Neural Networks

Understanding CNN

neural networks

Neural Networks

Neural networks

net-compress_ICML(18')

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据，机器学习，人工智能