Gradient Descent in Practice II - Learning Rate
Debugging gradient descent. Make a plot with number of iterations on the x-axis. Now plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.
Automatic convergence test. Declare convergence if J(θ) decreases by less than e in one iteration, where e is some small value such as 10^−3. However in practice it's difficult to choose this threshold value.
It has been proven that if learning rate α is sufficiently small, then J(θ) will decrease on every iteration.
To summarize:
If α is too small: slow convergence.
If α is too large: may not decrease on every iteration and thus may not converge.
来源:coursera 斯坦福 吴恩达 机器学习
网友评论