美文网首页
卷积层和pooling层如何参与梯度回传的

卷积层和pooling层如何参与梯度回传的

作者: 努力学习的CC | 来源:发表于2020-03-25 15:55 被阅读0次

参考

参考2

卷积层的反向梯度传播

我们先从最简单的情况开始推导

普通的反向传播

其中Z_i = w_i a_{i-1}+b_i,a_i = \sigma_i(z_i),我们想要更新权重,根据w_i^{`} = w_i -\eta \frac {\partial C}{\partial w_i},那么我们可以根据链式法则:\frac {\partial C}{\partial w_i} = \frac {\partial C}{\partial a_i} \frac {\partial a_i}{\partial z_i} \frac {\partial z_i}{\partial w_i}, 这样我们可以写成 w1,w2的更新公式:

\frac{\partial C}{\partial w_2} = \frac {\partial C}{\partial a_2} \frac {\partial a_2}{\partial z_2} \frac {\partial z_2}{\partial w_2} = \frac {\partial C}{\partial a_2} \sigma^{`}(z_2)a_1

\frac {\partial C}{\partial w_1} = \frac {\partial C}{\partial a_2} \frac {\partial a_2}{\partial z_2} \frac {\partial z_2}{\partial a_1} \frac {\partial a_1}{\partial z_1} \frac {\partial z_1}{\partial w_1} = \frac {\partial C}{\partial a_2} \sigma^{`}(z_2) w_2 \sigma^{`}(z_1)a_0

#如果我们用的是Sigmoid函数,其导数的最大值为0.25,如果权重初始化的不合适,那很容易造成中间的累乘项越乘越小,从而导致梯度消失,所以我们一般会选择relu作为激活函数,同时权重的初始化也需要调好

上面的情况是最简单的情况,如果我们现在的网络隐藏层有多个神经元:

我们还是用上面的思路,先求网络的残差:

\frac {\partial C}{\partial z_{21}}=\frac {\partial C}{\partial \sigma_{21}} \frac {\partial  \sigma_{21}}{\partial z_{21}} = \frac {\partial C}{\partial \sigma_{21}} \sigma^{`}(z_{21})

\frac {\partial C}{\partial z_{22}}=\frac {\partial C}{\partial \sigma_{22}} \frac {\partial  \sigma_{22}}{\partial z_{22}} = \frac {\partial C}{\partial \sigma_{22}}  \sigma^{`}(z_{22})

然后我们就可以继续写出权重的更新公式,这里以w11作为一个参考:

\frac {\partial C}{\partial w_{11}}=\frac {\partial C}{\partial \sigma_{21}} \frac {\partial  \sigma_{21}}{\partial z_{21}} \frac {\partial  z_{21}}{\partial  \sigma_{11}} \frac {\partial \sigma_{11}} {\partial z_{11}} \frac {\partial z_{11}} {\partial w_{11}}  + \frac {\partial C}{\partial \sigma_{22}} \frac {\partial  \sigma_{22}}{\partial z_{22}} \frac {\partial  z_{22}}{\partial  \sigma_{11}} \frac {\partial \sigma_{11}} {\partial z_{11}} \frac {\partial z_{11}} {\partial w_{11}}

= \frac {\partial C}{\partial z_{21}} \sigma^{`}(z_{21})w_{21} \sigma^{`}(z_{11})a_{1}^0+ \frac  {\partial C}{\partial z_{22}} \sigma^{`}(z_{22})w_{22} \sigma^{`}(z_{11})a_{1}^0

卷积层的反向梯度传播

卷积的操作可以看作是两个矩阵的乘法,那这里假设左边的是我们的特征图或者输入图像,中间的是我们的卷积核,得到右边的卷积结果:

\begin{bmatrix}   a_{11} & a_{12} & a_{13} \\   a_{21} & a_{22} & a_{23} \\   a_{31} & a_{32} & a_{33}  \end{bmatrix} *\begin{bmatrix}   w_{11} & w_{12} \\   w_{21} & w_{22} \\  \end{bmatrix} =\begin{bmatrix}   z_{11} & z_{12} \\   z_{21} & z_{22} \\  \end{bmatrix}

其中:

z_{11}=a_{11}w_{11}+a_{12}w_{12}+a_{21}w_{21}+a_{22}w_{22} \\z_{12}=a_{12}w_{11}+a_{13}w_{12}+a_{22}w_{21}+a_{23}w_{22} \\z_{21}=a_{21}w_{11}+a_{22}w_{12}+a_{31}w_{21}+a_{32}w_{22}\\z_{22}=a_{22}w_{11}+a_{23}w_{12}+a_{32}w_{21}+a_{33}w_{22}

上面的式子也可以用图来表示出来:

图中的左边的深绿色的圆点表示我们原图或者特征图中的a11-a33, 而中间的橘色的圆点表示的是卷积核的权重w11-w22,那么根据上图我们可以尝试写出反向传播的公式,其中卷积层的残差项\frac  {\partial C}{\partial z^{l-1}} = \frac  {\partial C}{\partial z^{l}} \frac  {\partial z^{l}}{\partial a^{l-1}} \frac {\partial a^{l-1}}  {\partial z^{l-1}}, 其中\frac  {\partial z^{l}}{\partial a^{l-1}}是计算的难点,我们用\nabla a来表示 \frac  {\partial z^{l}}{\partial a^{l-1}}然后尝试一项一项展开看看:

\nabla a_{11} = \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial a_{11}}  + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial a_{11}}  + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial a_{11}}  + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22}}{\partial a_{11}} = \frac {\partial C}{\partial z_{11}} w_{11}

\nabla a_{12} =  \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial a_{12}}  + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial a_{12}}  + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial a_{12}}  + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22}}{\partial a_{12}} = \frac {\partial C}{\partial z_{11}} w_{12} +\frac {\partial C}{\partial z_{12}} w_{11}

\nabla a_{13} =  \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial a_{13}}  + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial a_{13}}  + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial a_{13}}  + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22}}{\partial a_{13}} = \frac {\partial C}{\partial z_{12}} w_{12}

\nabla a_{21 } =  \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial a_{21}}  + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial a_{21}}  + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial a_{21}}  + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22}}{\partial a_{21}} = \frac {\partial C}{\partial z_{11}} w_{21}  + \frac {\partial C}{\partial z_{21}} w_{21}

\nabla a_{22 } =  \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial a_{22}}  + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial a_{22}}  + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial a_{22}}  + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22}}{\partial a_{22}} = \frac {\partial C}{\partial z_{11}} w_{22}  + \frac {\partial C}{\partial z_{12}} w_{21}  + \frac {\partial C}{\partial z_{21}} w_{12}  + \frac {\partial C}{\partial z_{22}} w_{11}

\nabla a_{23 } =  \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial a_{23}}  + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial a_{23}}  + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial a_{23}}  + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22}}{\partial a_{23}} =  \frac {\partial C}{\partial z_{12}} w_{22}  + \frac {\partial C}{\partial z_{22}} w_{12}

\nabla a_{31 } =  \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial a_{31}}  + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial a_{31}}  + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial a_{31}}  + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22}}{\partial a_{31}} =  \frac {\partial C}{\partial z_{21}} w_{21}

\nabla a_{32 } =  \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial a_{32}}  + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial a_{32}}  + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial a_{32}}  + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22}}{\partial a_{32}} =  \frac {\partial C}{\partial z_{21}} w_{22}  + \frac {\partial C}{\partial z_{22}} w_{21}

\nabla a_{33 } =  \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial a_{33}}  + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial a_{33}}  + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial a_{33}}  + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22}}{\partial a_{33}} =   \frac {\partial C}{\partial z_{22}} w_{22}

通过以上可以发现一个规律:

\begin{bmatrix}  0 & 0 & 0 & 0 \\   0 & \frac {\partial C}{\partial z_{11}} & \frac {\partial C}{\partial z_{12}} & 0 \\    0 & \frac {\partial C}{\partial z_{21}} & \frac {\partial C}{\partial z_{22}} & 0 \\0 & 0 & 0 & 0 \\  \end{bmatrix} \begin{bmatrix}w_{22} & w_{21} \\w_{12} & w_{11}\end{bmatrix} = \begin{bmatrix}  \nabla a_{11} & \nabla  a_{12} & \nabla  a_{13} \\   \nabla  a_{21} & \nabla  a_{22} & \nabla  a_{23} \\   \nabla  a_{31} & \nabla  a_{32} & \nabla a_{33}  \end{bmatrix}

那么也就是说卷积层的残差计算可以写成:

\frac  {\partial C}{\partial z^{l-1}} = \frac  {\partial C}{\partial z^{l}} \frac  {\partial z^{l}}{\partial a^{l-1}} \frac {\partial a^{l-1}}  {\partial z^{l-1}}=\frac {\partial C}{\partial z^l} * rot_{180}(w^l)*\sigma^{`}(z^{l-1})

对于卷积核权重的更新w^{`} = w - \eta\frac {\partial C}{\partial w}\frac {\partial C}{\partial w } = \frac {\partial C}{\partial z} \frac {\partial z}{\partial w}, 根据上图,我们可以尝试把每个权重的更新写出来:

\nabla  w_{11} = \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial w_{11}} + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial w_{11}} + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial w_{11}} + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22 }}{\partial w_{11}} = \frac {\partial C}{\partial z_{11}} a_{11} + \frac {\partial C}{\partial z_{12}} a_{12 } + \frac {\partial C}{\partial z_{21}}a_{21} + \frac {\partial C}{\partial z_{22}}  a_{22}

\nabla w_{12} = \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial w_{12}} + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial w_{12}} + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial w_{12}} + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22 }}{\partial w_{12}} = \frac {\partial C}{\partial z_{11}} a_{12} + \frac {\partial C}{\partial z_{12}} a_{13 } + \frac {\partial C}{\partial z_{21}}a_{22} + \frac {\partial C}{\partial z_{22}} a_{23}

\nabla w_{21} = \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial w_{21}} + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial w_{21}} + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial w_{21}} + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22 }}{\partial w_{21}} = \frac {\partial C}{\partial z_{11}} a_{21} + \frac {\partial C}{\partial z_{12}} a_{22 } + \frac {\partial C}{\partial z_{21}}a_{31} + \frac {\partial C}{\partial z_{22}} a_{32}

\nabla w_{22} = \frac {\partial C}{\partial z_{11}} \frac {\partial z_{11}}{\partial w_{22}} + \frac {\partial C}{\partial z_{12}} \frac {\partial z_{12}}{\partial w_{22}} + \frac {\partial C}{\partial z_{21}} \frac {\partial z_{21}}{\partial w_{22}} + \frac {\partial C}{\partial z_{22}} \frac {\partial z_{22 }}{\partial w_{22}} = \frac {\partial C}{\partial z_{11}} a_{22} + \frac {\partial C}{\partial z_{12}} a_{23 } + \frac {\partial C}{\partial z_{21}}a_{32} + \frac {\partial C}{\partial z_{22}} a_{33}

同样,我们也可以写成矩阵的形式:

\begin{bmatrix}   a_{11} & a_{12} & a_{13} \\   a_{21} & a_{22} & a_{23} \\   a_{31} & a_{32} & a_{33}  \end{bmatrix} * \begin{bmatrix}  \frac {\partial C}{\partial z_{11}} & \frac {\partial C}{\partial z_{12}} \\    \frac {\partial C}{\partial z_{21}} & \frac {\partial C}{\partial z_{22}}  \\     \end{bmatrix}  = \begin{bmatrix}\nabla w_{11} & \nabla w_{12} \\\nabla w_{21} & \nabla w_{22} \\\end{bmatrix}

\frac {\partial C}{\partial w^l} = a^{l-1}* \frac {\partial C}{\partial z^l},其中\frac {\partial C}{\partial z^l} 可以根据上面推出来的残差更新公式进行计算

池化层的反向梯度传播

池化层在反向梯度传播中做的就是起到了一个梯度传递的作用,将一个pixel的梯度传递给多个pixel,但是要保证传递的梯度的总和不变。同时根据不同的池化层,反向传播也是不一样的。

Average Pooling:

正向传播的时候,取一个patch的平均值作为pooling的结果,那么反向传播的时候,就是把一个pixel的梯度平均分给多个pixel

Max pooling:

正向传播的时候,取一个patch的最大值作为pooling的结果,反向传播的时候,要记住哪个位置的像素值最大,我们将对应的梯度传递给最大像素所在的位置,其他位置是0

相关文章

  • 卷积层和pooling层如何参与梯度回传的

    参考 参考2 卷积层的反向梯度传播 我们先从最简单的情况开始推导 普通的反向传播 其中,,我们想要更新权重,根据,...

  • CNN卷积神经网络

    弄清楚CNN,需要解决两个问题,一是卷积层(Convolution层),二是池化层(Pooling 层) 卷积层 ...

  • 卷积神经网络(CNN)

    卷积神经网络的构成 一个卷积神经网络由若干卷积层、Pooling层、全连接层组成,常用架构模式为: 也就是N个卷积...

  • 吴恩达深度学习笔记(79)-池化层讲解(Pooling laye

    池化层(Pooling layers) 除了卷积层,卷积网络也经常使用池化层来缩减模型的大小,提高计算速度,同时提...

  • AlexNet论文笔记

    网络结构 AlexNet包含5个卷积层和3个全连接层,此外还包含一些max pooling层、LRN层(局部响应归...

  • 2019-05-02 Day11 整体结构&卷积层

    Day11 整体结构&卷积层 7.1 整体结构 Conv -> ReLU -> (Pooling)类比 Affin...

  • pooling层的反向传播

    max pooling: 下一层的梯度会原封不动地传到上一层最大值所在位置的神经元,其他位置的梯度为0;avera...

  • Padding in tensorflow

    之前每次在算卷积层和pooling 层的输出size的时候都很头疼,感觉不知道别人的代码是怎么算的,仔细研究了一下...

  • 深度卷积神经网在图像分类中的应用---深度综述(1)

    基本的卷积神经网结构由输入层、卷积层、池化层、全连接层及输出层构成。卷积层和池化层一般会取若干个,采用卷积层和池化...

  • 反卷积层

    反卷积层和卷积层非常相似,首先需要明确反卷积层的参数是需要学习的,不是直接用反卷积层所对应的卷积层的权重,反卷积又...

网友评论

      本文标题:卷积层和pooling层如何参与梯度回传的

      本文链接:https://www.haomeiwen.com/subject/xkhayhtx.html