【机器学习】-Week2 3. 梯度下降实践1- 特征缩放

作者: Kitty_风花 | 来源:发表于2019-11-30 10:49 被阅读0次

【机器学习】-Week2 3. 梯度下降实践1- 特征缩放
21吴恩达机器学习课程大纲
[机器学习入门] 李宏毅机器学习笔记-3 （Gradient D
用人话讲明白梯度下降Gradient Descent（以求解多元
【机器学习】-Week2 3. 梯度下降实践2-学习速率
梯度下降--特征变量缩放
机器学习-吴恩达笔记2
标准化/归一化
梯度下降算法中的优化途径
实现特征缩放/归一化和标准化

Gradient Descent in Practice I - Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally:

These aren't exact requirements; we are only trying to speed things up. The goal is to get all input variables into roughly one of these ranges, give or take a few.

Two techniques to help with this are feature scaling and mean normalization. Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. Mean normalization involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero. To implement both of these techniques, adjust your input values as shown in this formula:

Where μi is the average of all the values for feature (i) and s_i is the range of values (max - min), or s_i is the standard deviation.

Note that dividing by the range, or dividing by the standard deviation, give different results. The quizzes in this course use range - the programming exercises use standard deviation.

For example, if x_i represents housing prices with a range of 100 to 2000 and a mean value of 1000, then