美文网首页
TensorFlow中的Learning rate decay介

TensorFlow中的Learning rate decay介

作者: EdwardLee | 来源:发表于2017-03-02 19:30 被阅读0次

    在模型训练DL模型时,随着模型的epoch迭代,往往会推荐逐渐减小learning rate,在一些实验中也证明确实对训练的收敛有正向效果。对于learning rate的改变,有定制衰减规则直接控制的,也有通过算法自动寻优的。这里主要介绍下TF自带的两种衰减方法:指数衰减和多项式衰减。

    指数衰减(tf.train.exponential_decay)

    方法原型:

    tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None){#exponential_decay}

    参数:

    learning_rate:初始值

    global_step:全局step数(每个step对应一次batch)

    decay_steps:learning rate更新的step周期,即每隔多少step更新一次learning rate的值

    decay_rate:指数衰减参数(对应α^t中的α)

    staircase:是否阶梯性更新learning rate,也就是global_step/decay_steps的结果是float型还是向下取整

    计算公式:

    decayed_learning_rate=learning_rate*decay_rate^(global_step/decay_steps)

    多项式衰减(tf.train.polynomial_decay)

    方法原型:

    tf.train.polynomial_decay(learning_rate, global_step, decay_steps, end_learning_rate=0.0001, power=1.0, cycle=False, name=None){#polynomial_decay}

    参数:

    learning_rate:初始值

    global_step:全局step数(每个step对应一次batch)

    decay_steps:learning rate更新的step周期,即每隔多少step更新一次learning rate的值

    end_learning_rate:衰减最终值

    power:多项式衰减系数(对应(1-t)^α的α)

    cycle:step超出decay_steps之后是否继续循环t

    计算公式:

    当cycle=False时

    global_step=min(global_step, decay_steps)

    decayed_learning_rate=

    (learning_rate-end_learning_rate)*(1-global_step/decay_steps)^(power)+end_learning_rate

    当cycle=True时

    decay_steps=decay_steps*ceil(global_step/decay_steps)

    decayed_learning_rate=

    (learning_rate-end_learning_rate)*(1-global_step/decay_steps)^(power)+end_learning_rate

    注:ceil是向上取整

    更新lr的一般代码:

    def _configure_learning_rate(num_samples_per_epoch, global_step):

    """Configures the learning rate.

    Args:

    num_samples_per_epoch: The number of samples in each epoch of training.

    global_step: The global_step tensor.

    Returns:

    A `Tensor` representing the learning rate.

    Raises:

    ValueError: if

    """

    decay_steps = int(num_samples_per_epoch / FLAGS.batch_size *

    FLAGS.num_epochs_per_decay)

    if FLAGS.sync_replicas:

    decay_steps /= FLAGS.replicas_to_aggregate

    if FLAGS.learning_rate_decay_type == 'exponential':

    return tf.train.exponential_decay(FLAGS.learning_rate,

    global_step,

    decay_steps,

    FLAGS.learning_rate_decay_factor,

    staircase=True,

    name='exponential_decay_learning_rate')

    elif FLAGS.learning_rate_decay_type == 'fixed':

    return tf.constant(FLAGS.learning_rate, name='fixed_learning_rate')

    elif FLAGS.learning_rate_decay_type == 'polynomial':

    return tf.train.polynomial_decay(FLAGS.learning_rate,

    global_step,

    decay_steps,

    FLAGS.end_learning_rate,

    power=1.0,

    cycle=False,

    name='polynomial_decay_learning_rate')

    else:

    raise ValueError('learning_rate_decay_type [%s] was not recognized',

    FLAGS.learning_rate_decay_type)

    相关文章

      网友评论

          本文标题:TensorFlow中的Learning rate decay介

          本文链接:https://www.haomeiwen.com/subject/miykgttx.html