Lecture 5 | Convergence in Neura

作者: Ysgc | 来源:发表于2019-10-20 03:34 被阅读0次

Lecture 5 | Convergence in Neura
Lecture 10 - Convolutional Neura
[cs231n]Lecture 6:Training Neura
Week3-Mathematical Representatio
LECTURE 5
lecture 5
Lecture 5
Lecture 5
lecture 6
组织行为学精要（四）

accuracy (counting) is not differentiable! and cross entropy error is just an approx of the accuracy
sometimes, minimizing the cross entropy is not minimizing the accuracy

perceptron and sigmoid NN can find the decision boundary successfully

Now one more point. Perception -> 100% accuracy, while sigmoid NN can not reach 100% accuracy (assume NN's weights are bounded => length of weights vector is 1)

high dim -> no one knows -> only hypothesis

saddle point -> some eigen values of the hessian matrix are positive, and some are negative

R => how fast it converges

R > 1 => getting worse
R = 1 => no better no worse
R<1 => better

First consider the quadratic cases

Newton's method 参考 https://zhuanlan.zhihu.com/p/83320557 chapter4.1
注意不同的是，4.1里面是函数本身求根，这里是要求导数的根，所以多加一次导数形式就匹配了。optimal step for grad is the second order derivative (hessian matrix)'s inverse

difference dim may have different optimal $\eta$ -> may converge in one direction, but diverge in the other -> have to get the min of all optimal $\eta$