美文网首页
Lecture 5 | Convergence in Neura

Lecture 5 | Convergence in Neura

作者: Ysgc | 来源:发表于2019-10-20 03:34 被阅读0次

accuracy (counting) is not differentiable! and cross entropy error is just an approx of the accuracy
sometimes, minimizing the cross entropy is not minimizing the accuracy

perceptron and sigmoid NN can find the decision boundary successfully

Now one more point. Perception -> 100% accuracy, while sigmoid NN can not reach 100% accuracy (assume NN's weights are bounded => length of weights vector is 1)

high dim -> no one knows -> only hypothesis

saddle point -> some eigen values of the hessian matrix are positive, and some are negative

R => how fast it converges

R > 1 => getting worse
R = 1 => no better no worse
R<1 => better


First consider the quadratic cases

Newton's method 参考 https://zhuanlan.zhihu.com/p/83320557 chapter4.1
注意不同的是,4.1里面是函数本身求根,这里是要求导数的根,所以多加一次导数形式就匹配了。optimal step for grad is the second order derivative (hessian matrix)'s inverse

difference dim may have different optimal \eta -> may converge in one direction, but diverge in the other -> have to get the min of all optimal \eta

coupled solution -> normalization of data quadratic term is approximated by Hessian Matrix if eta = 1 ~> equals to Newton's method curse of dim

but we dont need capture the whole Hessian matrix, right?

Hessian matrix and quadratic approximation may not be in the right direction a number of methods to approximate the Hessian

all these 2nd order method fail in high dim


does bfgs and LM solves the stability issue?

why not using multi step information??

inverse of hessian -> inverse of partial derivative

相关文章

网友评论

      本文标题:Lecture 5 | Convergence in Neura

      本文链接:https://www.haomeiwen.com/subject/wwlomctx.html