美文网首页
2020-09-13

2020-09-13

作者: 听闻不见 | 来源:发表于2020-09-13 16:44 被阅读0次

Activation function

  • Sigmoid

    1. Saturated nerons "kill" the gradient (input of large positive number or very negative number)
    2. Sigmoid outputs are not zero-centered
    3. exp() is compute expensive
  • tanh

    1. fix the point-2 of Sigmoid
  • ReLU

    1. do not saturate (in +region)
    2. computationally efficient
    3. converges much faster than sigmoid/tanh in practice

Preprocess

  • zero mean
    alway just use zero mean

Weight initialization

  • Xavier initialization

Batch Normalization

  • Improve gradient flow
  • allow higher learning rate
  • reduces the strong dependence on initialization

相关文章

网友评论

      本文标题:2020-09-13

      本文链接:https://www.haomeiwen.com/subject/augnektx.html