![](https://img.haomeiwen.com/i11683600/04c54e653e9e4d49.png)
![](https://img.haomeiwen.com/i11683600/fc2ed1f077135f5f.png)
![](https://img.haomeiwen.com/i11683600/b9a1360db012a6e2.png)
![](https://img.haomeiwen.com/i11683600/9171cd0e8fbf0f0a.png)
![](https://img.haomeiwen.com/i11683600/fbc4eacfb1f10e5c.png)
![](https://img.haomeiwen.com/i11683600/2bd93c976a690e46.png)
accuracy (counting) is not differentiable! and cross entropy error is just an approx of the accuracy
sometimes, minimizing the cross entropy is not minimizing the accuracy
![](https://img.haomeiwen.com/i11683600/7f96bf72e085f5d2.png)
perceptron and sigmoid NN can find the decision boundary successfully
![](https://img.haomeiwen.com/i11683600/13a5fe0b60c53330.png)
Now one more point. Perception -> 100% accuracy, while sigmoid NN can not reach 100% accuracy (assume NN's weights are bounded => length of weights vector is 1)
![](https://img.haomeiwen.com/i11683600/9ca20a5e73056a4f.png)
![](https://img.haomeiwen.com/i11683600/b9ebddbe8c17ffb4.png)
![](https://img.haomeiwen.com/i11683600/82bfb97147f2a5ca.png)
![](https://img.haomeiwen.com/i11683600/098d3247d6dcd032.png)
![](https://img.haomeiwen.com/i11683600/091582f1fb3e3a98.png)
![](https://img.haomeiwen.com/i11683600/2a4903cdac8f5c06.png)
![](https://img.haomeiwen.com/i11683600/2430535889f9584e.png)
high dim -> no one knows -> only hypothesis
saddle point -> some eigen values of the hessian matrix are positive, and some are negative
![](https://img.haomeiwen.com/i11683600/83d4f8b91dc79373.png)
![](https://img.haomeiwen.com/i11683600/9a4caeb9c4c285f3.png)
![](https://img.haomeiwen.com/i11683600/34cd33740814fe99.png)
![](https://img.haomeiwen.com/i11683600/b3ff8112e1534dcb.png)
![](https://img.haomeiwen.com/i11683600/19ebd21d7f52ca84.png)
![](https://img.haomeiwen.com/i11683600/6e48ae88cd2f34c3.png)
R => how fast it converges
R > 1 => getting worse
R = 1 => no better no worse
R<1 => better
First consider the quadratic cases
![](https://img.haomeiwen.com/i11683600/86bb327c259542d4.png)
![](https://img.haomeiwen.com/i11683600/1373f686c63f5c10.png)
Newton's method 参考 https://zhuanlan.zhihu.com/p/83320557 chapter4.1
注意不同的是,4.1里面是函数本身求根,这里是要求导数的根,所以多加一次导数形式就匹配了。optimal step for grad is the second order derivative (hessian matrix)'s inverse
![](https://img.haomeiwen.com/i11683600/9afbceda397cc335.png)
![](https://img.haomeiwen.com/i11683600/3245334997ab8ae3.png)
![](https://img.haomeiwen.com/i11683600/7c015d1d85e43e54.png)
![](https://img.haomeiwen.com/i11683600/74e14ff929cba9e9.png)
![](https://img.haomeiwen.com/i11683600/7c5d53d5af85e323.png)
![](https://img.haomeiwen.com/i11683600/bbb8948d08ce5c10.png)
difference dim may have different optimal -> may converge in one direction, but diverge in the other -> have to get the min of all optimal
![](https://img.haomeiwen.com/i11683600/937207f8c4d24ff6.png)
![](https://img.haomeiwen.com/i11683600/5cd1921253a947fd.png)
![](https://img.haomeiwen.com/i11683600/fc289208cc88ca37.png)
![](https://img.haomeiwen.com/i11683600/a85e34579d6dfa52.png)
![](https://img.haomeiwen.com/i11683600/c0a1586d8c673dbd.png)
![](https://img.haomeiwen.com/i11683600/33b824a03dbbfb3c.png)
![](https://img.haomeiwen.com/i11683600/c4977e32a8727149.png)
![](https://img.haomeiwen.com/i11683600/c119e1b82048b41a.png)
![](https://img.haomeiwen.com/i11683600/74d2ee77a1a134e1.png)
![](https://img.haomeiwen.com/i11683600/a8327d573fbb5477.png)
![](https://img.haomeiwen.com/i11683600/d0bbe42b15ca5d27.png)
but we dont need capture the whole Hessian matrix, right?
![](https://img.haomeiwen.com/i11683600/d4caeb99b50043cf.png)
![](https://img.haomeiwen.com/i11683600/7e72d18623749fb9.png)
all these 2nd order method fail in high dim
![](https://img.haomeiwen.com/i11683600/a301fd507b54f6e8.png)
![](https://img.haomeiwen.com/i11683600/54a4bd8b3ed65de5.png)
![](https://img.haomeiwen.com/i11683600/afd6c877a02e40a7.png)
![](https://img.haomeiwen.com/i11683600/315b2765c4c54786.png)
![](https://img.haomeiwen.com/i11683600/8dcc949a40d58a01.png)
![](https://img.haomeiwen.com/i11683600/13f9e418c4879e62.png)
![](https://img.haomeiwen.com/i11683600/c8f89f9467240749.png)
![](https://img.haomeiwen.com/i11683600/c6df000fc8792b35.png)
![](https://img.haomeiwen.com/i11683600/07dcf7002b5cbf2b.png)
![](https://img.haomeiwen.com/i11683600/70ca0e1b3193b20d.png)
![](https://img.haomeiwen.com/i11683600/f79f1aa7308fdde2.png)
why not using multi step information??
![](https://img.haomeiwen.com/i11683600/c55ebed54ebabfe1.png)
![](https://img.haomeiwen.com/i11683600/9f79b85eb6d8d198.png)
![](https://img.haomeiwen.com/i11683600/11105eec125d9b2e.png)
![](https://img.haomeiwen.com/i11683600/1d7be72d4bd94b2c.png)
![](https://img.haomeiwen.com/i11683600/a43e22773912a7fd.png)
![](https://img.haomeiwen.com/i11683600/f5f1a773b1593bb4.png)
网友评论