神经网络算法介绍

作者: 爱吃鱼的夏侯莲子 | 来源:发表于2020-01-25 16:10 被阅读0次

在解决分类问题时，可以用逻辑回归算法，但当解决复杂的非线性分类器时，这并不是一个好的选择。
如果用逻辑回归来解决，首先要构造一个包含很多非线性项的逻辑回归函数。使用逻辑回归会构造一个s型函数 $g$ 。当多项式足够多，足够复杂，会有一个非常扭曲的决策边界。这可能会出现过拟合的情况。
另一个问题，复杂的非线性分类器会包含有很多的特征项，要包含所有的特征项时很困难的事情，而且计算成本过大。

比如要识别一个图像是不是汽车，就要检测图像的每一个像素，这是一个非常大的计算量。因此神经网络是一个很好的选择。

神经网络模型

逻辑单元

这是一个最简单的神经网络的模型。左侧的是三个特征值的输入，右侧是一个输出，这是一个二元问题的神经网络模型。

一般在处理神经网络时，和逻辑回归一样，需要添加一个 $x_0$ 的默认特征项。在神经网络里，这称之为偏置单位
$x=\left[ \begin{matrix} x_0 \\\ x_1 \\\ x_2 \\\ x_3 \end{matrix} \right], \theta=\left[ \begin{matrix} \theta_0 \\\ \theta_1 \\\ \theta_2 \\\ \theta_3 \end{matrix} \right]$
还有关于 $h(\theta)$ 的函数
$h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}}$
在神经网络里这个成为激励函数，这是神经网络的术语，它是和逻辑回归里相同的函数。在这种情况下，激励函数的参数 $\theta$ 称之为权重。

看一个复杂一点的模型：

多层神经网络

上图中，第一层是输入层，然后进入第二层，最后输出预测函数，也就是第三层，输出层。
输入层和输出层之间的节点层，称之为隐藏层。上图中有一个隐藏层。
隐藏层中的 $a_i^{(j)}$ 是第 $j$ 层的第 $i$ 个单元。和输入层一样，在计算时会添加一个偏置单元 $a_0^{(j)}=1$
$a^{(j)}=\left[ \begin{matrix} a_0^{(j)} \\\ a_1^{(j)} \\\ a_2^{(j)} \\\ a_3^{(j)} \end{matrix} \right]$

$\theta^{(j)}$ 是从第 $j$ 层到第 $j+1$ 层的映射参数矩阵。

计算过程

隐藏层的激活节点和输出层的输出函数计算过程如下：
$a_0^{(2)} = 1$
$a_1^{(2)} = g(\theta_{10}^{(1)}x_0+\theta_{11}^{(1)}x_1+\theta_{12}^{(1)}x_2+\theta_{13}^{(1)}x_3)$
$a_2^{(2)} = g(\theta_{20}^{(1)}x_0+\theta_{21}^{(1)}x_1+\theta_{22}^{(1)}x_2+\theta_{23}^{(1)}x_3)$
$a_3^{(2)} = g(\theta_{30}^{(1)}x_0+\theta_{31}^{(1)}x_1+\theta_{32}^{(1)}x_2+\theta_{33}^{(1)}x_3)$
$h_\theta(x) = a_1^{(3)} = g(\theta_{10}^{(2)}a_0^{(2)}+\theta_{11}^{(2)}a_1^{(2)}+\theta_{12}^{(2)}a_2^{(2)}+\theta_{13}^{(2)}a_3^{(2)})$

$\theta^{(j)}$ 是一个矩阵，关于它的维度：
在第 $j$ 层，该层有 $s_j$ 个单元，而第 $j+1$ 层有 $s_k$ 个单元，则 $\theta^{(j)}$ 会是一个 $s_k\times s_j + 1$ 维矩阵

举个例子，假如有一个三层的神经网络，第一层有2个输入特征值，第二层是3个单元，最后第三层输出1个预测结果：
$\theta^{(1)}$ s是一个 $3\times 3$ 的矩阵
$\theta^{(2)}$ s是一个 $1\times 4$ 的矩阵

为了方便，将激励函数中的参数用变量 $z$ 替换：
$a_0^{(2)} = 1$
$a_1^{(2)} = g(z_1^{(2)})$
$a_2^{(2)} = g(z_2^{(2)})$
$a_3^{(2)} = g(z_3^{(2)})$

其中 $z$ 为：
$z_k^{(2)}= \theta_{k,0}^{(1)}x_0 + \theta_{k,1}^{(1)}x_1 + ... + \theta_{k,n}^{(1)}x_n$

向量形式

用向量的形式来表示：

$x=\left[ \begin{matrix} x_0\\\x_1\\\x_2\\\x_3 \end{matrix} \right], z^{(2)}= \left[ \begin{matrix} z_1^{(2)} \\\ z_2^{(2)} \\\ z_2^{(2)} \end{matrix} \right]$

$z^{(2)} = \theta^{(1)}x$
$a^{(2)} = g(z^{(2)})$

第三层：

$z^{(3)} = \theta^{(2)}z^{(2)}$

类推到通常情况：
$z^{(j)} = \theta^{(j-1)}a^{(j-1)}$
$a^{(j)} = g(z^{(j)})$
最后一步
$h_\theta(x) = a^{(j+1)} = g(z^{(j+1)})$

举例说明

1.预测“与” AND

$\left[ \begin{matrix} x_0 \\\ x_1 \\\ x_2 \end{matrix} \right] → \left[ \begin{matrix} g(z^{(2)}) \end{matrix} \right] → h_\theta(x)$

其中 $x_0 = 1$

我们要计算“与”，其中 $x1,x2 \in [0,1]$ ， $y=x_1$ && $x_2$
设置 $\theta^{(1)} = \left[ \begin{matrix} -30 & 20 & 20 \end{matrix} \right]$

通过上面的公式
$h_\theta(x) = \theta^{(1)}x = g(-30+20x_1+20x_2)$

$x_1=0,x_2=0$ ， $h_\theta(x)=g(-30)\approx 0$
$x_1=0,x_2=1$ ， $h_\theta(x)=g(-10)\approx 0$
$x_1=1,x_2=0$ ， $h_\theta(x)=g(-10)\approx 0$
$x_1=1,x_2=1$ ， $h_\theta(x)=g(10)\approx 1$

满足”与“的逻辑。

2.预测“或” OR

与预测“AND”的神经网络模型一样，我们只是调整一下： $\theta^{(1)} = \left[ \begin{matrix} -10 & 20 & 20 \end{matrix} \right]$

$h_\theta(x) = \theta^{(1)}x = g(-10+20x_1+20x_2)$

$x_1=0,x_2=0$ ， $h_\theta(x)=g(-10)\approx 0$
$x_1=0,x_2=1$ ， $h_\theta(x)=g(10)\approx 1$
$x_1=1,x_2=0$ ， $h_\theta(x)=g(10)\approx 1$
$x_1=1,x_2=1$ ， $h_\theta(x)=g(30)\approx 1$

满足”或“的逻辑。

3.预测“异或” XOR

$\left[ \begin{matrix} x_0 \\\ x_1 \\\ x_2 \end{matrix} \right] → \left[ \begin{matrix} a_0^{(2)} \\\ a_1^{(2)} \\\ a_2^{(2)} \end{matrix} \right] → \left[ \begin{matrix} g(z^{(3)}) \end{matrix} \right] → h_\theta(x)$

其中 $x_0=1,a_0^{(2)}=1$

$\theta^{(1)} = \left[ \begin{matrix} -10 & 20 & 20 \\\ 40 & -30 & -30 \end{matrix} \right], \theta^{(2)} = \left[ \begin{matrix} -30 & 20 & 20 \end{matrix} \right]$

$a^{(2)} = \theta^{(1)}x$ ， $h_\theta(x)=a^{(3)}=\theta^{(2)}a^{(2)}$

$x_1=0,x_2=0$ ， $a^{(2)}=\left[ \begin{matrix} 1 \\\ g(-10)\\\ g(40) \end{matrix} \right]\approx\left[ \begin{matrix} 1 \\\ 0 \\\ 1 \end{matrix} \right]$ ， $h_\theta(x)=g(-10)\approx 0$

$x_1=0,x_2=1$ ， $a^{(2)}=\left[ \begin{matrix} 1 \\\ g(10) \\\ g(10) \end{matrix} \right]\approx\left[ \begin{matrix}1 \\\ 1 \\\ 1 \end{matrix} \right]$ ， $h_\theta(x)=g(10)\approx 1$

$x_1=1,x_2=0$ ， $a^{(2)}=\left[ \begin{matrix} 1 \\\ g(10) \\\ g(10) \end{matrix} \right]\approx\left[ \begin{matrix}1 \\\ 1 \\\ 1 \end{matrix} \right]$ ， $h_\theta(x)=g(10)\approx 1$

$x_1=1,x_2=1$ ， $a^{(2)}=\left[ \begin{matrix} 1 \\\ g(30) \\\ g(-20) \end{matrix} \right]\approx\left[ \begin{matrix} 1 \\\ 1 \\\ 0 \end{matrix} \right]$ ， $h_\theta(x)=g(-10)\approx 0$

这符合“异或”的逻辑。

转载自：
https://codeeper.com/2020/01/16/tech/machine_learning/neural_network_intro.html

神经网络算法介绍

神经网络模型

计算过程

向量形式

举例说明

1.预测“与” AND

2.预测“或” OR

3.预测“异或” XOR

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读