讲到了梯度下降的三种基本形式
- batch gradient descent
- stochastic gradient descent
- mini-batch gradient descent
challeges
梯度下降算法
- Mometum
- Nesterov accelerated gradient
- Adagrad
- Adadelta
- RMSprop
- Adam
- AdaMax
- Nadam
- AMSGrad
并行、分布SGD
优化SGD的其他策略
https://ruder.io/optimizing-gradient-descent/index.html#otherrecentoptimizers (原文)
http://blog.csdn.net/google19890102/article/details/69942970 (中文)
网友评论