目的

为了更好地理解神经网络工作原理，本文打算不借助任何第三方机器学习框架从零基础构建简单的神经网络，这有助于更深层的理解神经网络的本质。

什么是神经网络

完整的介绍神经网络需要从神经科学讲起，这也不是本文的重点，为了理解方便我们把神经网络所需要的输入输出用数学函数的方式表示。

神经网络的组成部分：

* 输入层 x
* 隐藏层（任意数量）
* 输出层 y
* 每层之间的权重w 和 偏移量 b
* 每层之间的激活函数，本文我们用sigmoid作为激活函数

下图为本文要构建的2层神经网络，值得注意的是网络层数通常不包含输入层

python构建神经网络：

class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1],4) 
        self.weights2   = np.random.rand(4,1)                 
        self.y          = y
        self.output     = np.zeros(y.shape)

训练：

该2层神经网络的的输出值为：

网络输出
从公式中可以看出，只有权重w和偏移量b影响最终的输出。
通常，权重和偏移量的正确性决定了神经网络预测的强度。 从输入数据微调权重和偏移量的过程称为训练神经网络。
神经网络训练过程中的每次迭代包括以下两个部分：
前向传播：计算网络预测值y
反向传播：更新权重w 和偏移量b

网络训练流程图

前向传播（Feedforward）：

通过训练流程图可知，前向传播过程就是在计算网络输出的过程。我们在上述类中添加一个前向传播函数计算该过程，为了计算方便，我们把偏移量b设置为0(没有偏移量)。

class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1],4) 
        self.weights2   = np.random.rand(4,1)                 
        self.y          = y
        self.output     = np.zeros(self.y.shape)

    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1))
        self.output = sigmoid(np.dot(self.layer1, self.weights2))

当然，我们怎么知道模型输出值预测的准确性（模型性能的好坏）呢？损失函数（Loss Function） 会帮助我们解决此问题。

损失函数（Loss Function）：

虽然可用的损失函数很多，但是我们需要解决的问题就决定了我们应该用什么样的损失函数。在本文，我们将使用简单的均方误差作为我们的损失函数。

4.png
也就是说，均方误差仅仅是每个预测值与实际值之差的总和。差值是平方的，因此我们测量差值的绝对值。
训练的目标是找到最佳的权重和偏移量，以最大限度地减少损失函数。

反向传播（Backpropogation）：

已经得到了模型的预测误差，需要一种方法来传播误差，并更新我们的权重和偏移量。
为了知道调整权重和偏移量的适当数值，需要知道损失函数相对于权重和偏移量的导数。

回想一下微积分函数的导数就是函数的斜率，如果我们有导数，我们可以通过增加/减少来简单地更新权重和偏差（参见上图）。这被称为梯度下降。
但是，我们不能直接计算损失函数相对于权重和偏差的导数，因为损失函数的方程不包含权重和偏差。因此，我们需要链规则来帮助我们计算它。

用于计算损失函数相对于权重的导数的链规则。请注意，为简单起见，我们仅假设1层神经网络显示偏导数

虽然看起来比较复杂，但我们得到我们需要的东西损失函数相对于权重的导数（斜率），这样我们就可以相应地调整权重。

现在我们有了，让我们将反向传播函数添加到我们的python代码中。

class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1],4) 
        self.weights2   = np.random.rand(4,1)                 
        self.y          = y
        self.output     = np.zeros(self.y.shape)

    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1))
        self.output = sigmoid(np.dot(self.layer1, self.weights2))

    def backprop(self):
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        # update the weights with the derivative (slope) of the loss function
        self.weights1 += d_weights1
        self.weights2 += d_weights2

上面我们已经完成了一个简单的2层神经网络的python代码，下面我们看一下这个网络模型是如何运作的。