美文网首页
2.2多层感知机

2.2多层感知机

作者: 纵春水东流 | 来源:发表于2021-04-26 14:21 被阅读0次

    1.图形
    给定单隐层,足够的node,感知机能够模拟任何函数


    An MLP with a hidden layer of 5 hidden units

    2.从线性到非线性
    线性
    \begin{split}\begin{aligned} {H} & = {X} {W}^{(1)} + {b}^{(1)}, \\ {O} & = {H}{W}^{(2)} +{b}^{(2)}. \end{aligned}\end{split}

    非线性
    \begin{split}\begin{aligned} {H} & = \sigma({X} {W}^{(1)} + {b}^{(1)}), \\ {O} & = {H}{W}^{(2)} + {b}^{(2)}.\\ \end{aligned}\end{split}

    多层非线性
    {H}^{(1)} = \sigma_1({X} {W}^{(1)} + {b}^{(1)})
    {H}^{(2)} = \sigma_2({H}^{(1)} {W}^{(2)} + {b}^{(2)})

    3.正则化
    (1)L2正则
    L({w}, b) + \frac{\lambda}{2} \|{w}\|^2,
    \begin{split}\begin{aligned} h' = \begin{cases} 0 & \text{ with probability } p \\ \frac{h}{1-p} & \text{ otherwise} \end{cases} \end{aligned}\end{split}

    (2)Dropunt


    一个放大过程,保证剩余点产生的输出值不变。
    \begin{split}\begin{aligned} h' = \begin{cases} 0 & \text{ with probability } p \\ \frac{h}{1-p} & \text{ otherwise} \end{cases} \end{aligned}\end{split}

    3.正向传播与反向传播
    (1)正向传播
    {z}= {W}^{(1)} {x},
    {h}= \phi ({z}).
    {o}= {W}^{(2)} {h}.
    L = l({o}, y).#损失函数
    s = \frac{\lambda}{2} \left(\|{W}^{(1)}\|_F^2 + \|{W}^{(2)}\|_F^2\right),#L2正则损失
    J = L + s.#加了正则的损失函数

    (2)返向传播
    \frac{\partial \mathsf{Z}}{\partial \mathsf{X}} = \text{prod}\left(\frac{\partial \mathsf{Z}}{\partial \mathsf{Y}}, \frac{\partial \mathsf{Y}}{\partial \mathsf{X}}\right).

    \frac{\partial J}{\partial L} = 1 \; \text{and} \; \frac{\partial J}{\partial s} = 1.

    \frac{\partial J}{\partial {o}} = \text{prod}\left(\frac{\partial J}{\partial L}, \frac{\partial L}{\partial {o}}\right) = \frac{\partial L}{\partial {o}} \in \mathbb{R}^q.

    \frac{\partial s}{\partial {W}^{(1)}} = \lambda {W}^{(1)} \; \text{and} \; \frac{\partial s}{\partial {W}^{(2)}} = \lambda {W}^{(2)}.

    \frac{\partial J}{\partial {W}^{(2)}}= \text{prod}\left(\frac{\partial J}{\partial {o}}, \frac{\partial {o}}{\partial {W}^{(2)}}\right) + \text{prod}\left(\frac{\partial J}{\partial s}, \frac{\partial s}{\partial {W}^{(2)}}\right)= \frac{\partial J}{\partial {o}} {h}^\top + \lambda {W}^{(2)}.

    \frac{\partial J}{\partial {h}} = \text{prod}\left(\frac{\partial J}{\partial {o}}, \frac{\partial {o}}{\partial {h}}\right) = {{W}^{(2)}}^\top \frac{\partial J}{\partial {o}}.

    \frac{\partial J}{\partial {z}} = \text{prod}\left(\frac{\partial J}{\partial {h}}, \frac{\partial {h}}{\partial {z}}\right) = \frac{\partial J}{\partial {h}} \odot \phi'\left({z}\right).

    \frac{\partial J}{\partial {W}^{(1)}} = \text{prod}\left(\frac{\partial J}{\partial {z}}, \frac{\partial {z}}{\partial {W}^{(1)}}\right) + \text{prod}\left(\frac{\partial J}{\partial s}, \frac{\partial s}{\partial {W}^{(1)}}\right) = \frac{\partial J}{\partial {z}} {x}^\top + \lambda {W}^{(1)}.

    4.梯度消失与梯度爆炸
    {h}^{(l)} = f_l ({h}^{(l-1)}) \text{ and thus } {o} = f_L \circ \ldots \circ f_1({x}).

    \partial_{{W}^{(l)}} {o} = \underbrace{\partial_{{h}^{(L-1)}} {h}^{(L)}}_{ {M}^{(L)} \stackrel{\mathrm{def}}{=}} \cdot \ldots \cdot \underbrace{\partial_{{h}^{(l)}} {h}^{(l+1)}}_{ {M}^{(l+1)} \stackrel{\mathrm{def}}{=}} \underbrace{\partial_{{W}^{(l)}} {h}^{(l)}}_{ {v}^{(l)} \stackrel{\mathrm{def}}{=}}.

    1. 参数初始化:正态分布初始化与泽维尔初始化
      (1)正态分布初始化
      (2)泽维尔初始化
      o_{i} = \sum_{j=1}^{n_\mathrm{in}} w_{ij} x_j.#线性层输出
      假设权重参数W平均数为0,方差为\sigma^2,同时假设输出X_j具有平均数为0
      方差为\gamma^2分布,且假设它们之间全部独立。我们可以计算它的输出平均数与方差
      \begin{split}\begin{aligned} E[o_i] & = \sum_{j=1}^{n_\mathrm{in}} E[w_{ij} x_j] \\&= \sum_{j=1}^{n_\mathrm{in}} E[w_{ij}] E[x_j] \\&= 0, \\ \mathrm{Var}[o_i] & = E[o_i^2] - (E[o_i])^2 \\ & = \sum_{j=1}^{n_\mathrm{in}} E[w^2_{ij} x^2_j] - 0 \\ & = \sum_{j=1}^{n_\mathrm{in}} E[w^2_{ij}] E[x^2_j] \\ & = n_\mathrm{in} \sigma^2 \gamma^2. \end{aligned}\end{split}
      如何把n_\mathrm{in} \sigma^2 = 1固定为1呢?
      \begin{aligned} \frac{1}{2} (n_\mathrm{in} + n_\mathrm{out}) \sigma^2 = 1 \text{ or equivalently } \sigma = \sqrt{\frac{2}{n_\mathrm{in} + n_\mathrm{out}}}. \end{aligned}

    U\left(-\sqrt{\frac{6}{n_\mathrm{in} + n_\mathrm{out}}}, \sqrt{\frac{6}{n_\mathrm{in} + n_\mathrm{out}}}\right).

    相关文章

      网友评论

          本文标题:2.2多层感知机

          本文链接:https://www.haomeiwen.com/subject/xghsrltx.html