美文网首页
2020-09-13

2020-09-13

作者: 听闻不见 | 来源:发表于2020-09-13 16:44 被阅读0次

    Activation function

    • Sigmoid

      1. Saturated nerons "kill" the gradient (input of large positive number or very negative number)
      2. Sigmoid outputs are not zero-centered
      3. exp() is compute expensive
    • tanh

      1. fix the point-2 of Sigmoid
    • ReLU

      1. do not saturate (in +region)
      2. computationally efficient
      3. converges much faster than sigmoid/tanh in practice

    Preprocess

    • zero mean
      alway just use zero mean

    Weight initialization

    • Xavier initialization

    Batch Normalization

    • Improve gradient flow
    • allow higher learning rate
    • reduces the strong dependence on initialization

    相关文章

      网友评论

          本文标题:2020-09-13

          本文链接:https://www.haomeiwen.com/subject/augnektx.html