优化：AdamOptimizer

作者: 骑鲸公子_ | 来源:发表于2018-04-23 10:53 被阅读0次

优化：AdamOptimizer
Tensorflow中优化器--AdamOptimizer详解
Tensorflow：Adam Optimizer使用
AdamOptimizer Loss Null
内存优化
Android进阶之性能优化
性能优化
Android开发艺术探索之性能优化笔记
Android性能优化
对于手游的优化

__init__

Args:

learning_rate: A Tensor or a floating point value. The learning rate.控制了权重的更新比率（如 0.001）。较大的值（如 0.3）在学习率更新前会有更快的初始学习，而较小的值（如 1.0E-5）会令训练收敛到更好的性能。

beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.一阶矩估计的指数衰减率

beta2: A float value or a constant float tensor.The exponential decay rate for the 2nd moment estimates.二阶矩估计的指数衰减率

epsilon: A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before

Section 2.1), not the epsilon in Algorithm 1 of the paper.该参数是非常小的数，其为了防止在实现中除以零

use_locking: If True use locks for update operations.

name: Optional name for the operations created when applying gradients.

Initialization:

m_0 <- 0 (Initialize initial 1st moment vector)

v_0 <- 0 (Initialize initial 2nd moment vector)

t <- 0 (Initialize timestep)

The update rule for `variable` with gradient `g` uses an optimization described at the end of section2 of the paper:

t <- t + 1

lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)

m_t <- beta1 * m_{t-1} + (1 - beta1) * g

v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g

variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)