深度学习讲稿(25)

作者: 山岳之心 | 来源:发表于2021-03-06 15:46 被阅读0次

5.3 随机梯度下降法

我们首先来看随机梯度下降法的运算顺序。前面我们已经讲了这种算法的学习逻辑。它对整个数据集要循环好几次。每次计算都利用上一条的数据得到的最优权重作为基准，在此基础上再使用梯度下降来得到新的最优权重。所以对于我们上面考虑的成本和售价问题，就可以分段写为如下的几个部分。甚至可以建几个py文件，分别对应不同的功能，最后用一个main函数把它们都串在一起。这样操作的话，代码会比较干净。

第一步是数据处理，正如我们以前强调的，最好要把数据处理成正规化数据，处理代码如下：

import numpy as np
cost_of_material = np.array([38.3, 35, 31.2, 43.2, 44.2, 41.2])
cost_of_sales = np.array([12.3, 11.2, 10.1, 9.2, 9.1, 9.6])
cost_of_human = np.array([22,21.3,23.7,23.2,20.5,24.2])
cost_of_product = np.array([7.2, 7.8, 8.3, 8.6, 8.8, 9.3])
sell_price = np.array([100.1, 102, 99.2, 101.2, 103.8, 100.5])

# 数据正规化，由于都是钱计价的，所以可以用同一个单位来正规化，比如这里使用100, 也可以不正规化，这里使用是为了提示你必须先考虑特征的正规化。
cost_of_material /= 100
cost_of_sales /= 100
cost_of_human /= 100
cost_of_product /= 100
sell_price /= 100

# 学习速率
learning_rate = np.array([0.5,0.5,0.5,0.5])

第二步是将随机梯度下降的算法部分全部实现为一个rand_grad类。它可以写成如下的形式：

class rand_grad:
    def __init__(self,inputs,outputs,weights):
        self.inputs = inputs
        self.weights = weights
        self.outputs = outputs

    # 预测函数
    def neural_network(self):
        prediction = self.inputs.dot(self.weights)
        return prediction

    # 代价函数
    def error_function(self):
        prediction = self.neural_network()
        return pow(prediction - self.outputs,2)

    # 梯度下降乘子
    def grad_descent_multiplier(self):
        multiplier = (self.neural_network() - \
            self.outputs)*inputs
        return multiplier

    # 运算停止控制条件
    def stop_condition(self,times):
        if (self.error_function()>1e-12 or times<50):
            return True
        else:
            return False

第三步是权重和数据的矩阵化预处理，方便使用矩阵的切片以及矩阵乘法，我们在上面的类中实现的也是矩阵算法，所以调用矩阵型的数据是比较方便的。

# 预设权重
weights = np.array([0.6,0.2,0.8,0.9])

# 转换成矩阵形式
raw_data = np.vstack((cost_of_material,cost_of_sales,cost_of_human,cost_of_product))
raw_data = raw_data.T
print(raw_data[0])

第四步是使用循环嵌套来实现随机梯度下降，这种算法基本没有使用矩阵计算的方便性，所以使用循环嵌套是随机梯度下降的一个非常重要的特征。在批处理梯度下降中，就把循环嵌套改写成了矩阵算法，运算效率大大提高。

# 当误差小于10的-10次方，即(1e-10)，即停止运算。
for ergo in range(30): #遍历30次数据集
    error_list = []
    for item_index in range(len(raw_data)):
        # 从第一条数据条开始循环
        inputs = raw_data[item_index]
        outputs = sell_price[item_index]
        weights = weights
        engine = rand_grad(inputs,outputs,weights)

第五步是嵌套上每一条数据的梯度下降循环，即：

        times = 0
        while engine.stop_condition(times):
            # 此处为梯度下降
            times += 1
            gdm = list(engine.grad_descent_multiplier())
            if min(np.abs(gdm)) > max(np.abs(weights)):
                gdm /= 10*min(np.abs(gdm))/max(np.abs(weights))
            # 下降因子相对于权重过大，此时应将它缩小，否则极易引起误差发散。
            factor = np.diag(gdm)
            weights -= np.matmul(learning_rate,factor)
        # 将误差记录下来，放到误差列表中
        error_list.append(engine.error_function())

我们可以跟随每一次循环数据集，看最终的权重以及整体误差变化的情形。代码如下：

    print("权重变化过程：", weights)
    print("整体误差变化过程：", sum(error_list))

综合所有的步骤，整体代码为：

import numpy as np
cost_of_material = np.array([38.3, 35, 31.2, 43.2, 44.2, 41.2])
cost_of_sales = np.array([12.3, 11.2, 10.1, 9.2, 9.1, 9.6])
cost_of_human = np.array([22,21.3,23.7,23.2,20.5,24.2])
cost_of_product = np.array([7.2, 7.8, 8.3, 8.6, 8.8, 9.3])
sell_price = np.array([100.1, 102, 99.2, 101.2, 103.8, 100.5])

# 数据正规化，由于都是钱计价的，所以可以用同一个单位来正规化，比如这里使用100, 也可以不正规化，这里使用是为了提示你必须先考虑特征的正规化。
cost_of_material /= 100
cost_of_sales /= 100
cost_of_human /= 100
cost_of_product /= 100
sell_price /= 100

learning_rate = np.array([0.5,0.5,0.5,0.5])
class rand_grad:
    def __init__(self,inputs,outputs,weights):
        self.inputs = inputs
        self.weights = weights
        self.outputs = outputs

    # 预测函数
    def neural_network(self):
        prediction = self.inputs.dot(self.weights)
        return prediction

    # 代价函数
    def error_function(self):
        prediction = self.neural_network()
        return pow(prediction - self.outputs,2)

    # 梯度下降乘子
    def grad_descent_multiplier(self):
        multiplier = (self.neural_network() - \
            self.outputs)*inputs
        return multiplier

    # 运算停止控制条件
    def stop_condition(self,times):
        if (self.error_function()>1e-10 or times<50):
            return True
        else:
            return False


# 预设权重
weights = np.array([0.6,0.2,0.8,0.9])

# 转换成矩阵形式
raw_data = np.vstack((cost_of_material,cost_of_sales,cost_of_human,cost_of_product))
raw_data = raw_data.T
print(raw_data[0])
# 当误差小于10的-10次方，即(1e-10)，即停止运算。
for ergo in range(30): #遍历30次数据集
    error_list = []
    for item_index in range(len(raw_data)):
        # 从第一条数据条开始循环
        inputs = raw_data[item_index]
        outputs = sell_price[item_index]
        weights = weights
        engine = rand_grad(inputs,outputs,weights)
        times = 0
        while engine.stop_condition(times):
            # 此处为梯度下降
            times += 1
            gdm = list(engine.grad_descent_multiplier())
            if min(np.abs(gdm)) > max(np.abs(weights)):
                gdm /= 10*min(np.abs(gdm))/max(np.abs(weights))
            # 下降因子相对于权重过大，此时应将它缩小，否则极易引起误差发散。
            factor = np.diag(gdm)
            weights -= np.matmul(learning_rate,factor)
        error_list.append(engine.error_function())

    print("权重变化过程：", weights)
    print("整体误差变化过程：", sum(error_list))

网友评论

本文标题：深度学习讲稿(25)

本文链接：https://www.haomeiwen.com/subject/aafsqltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

深度学习讲稿(25)

5.3 随机梯度下降法

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读