Batch Normalization

作者: 徐凯_xp | 来源:发表于2017-12-28 17:29 被阅读304次

batch_normalization
Normalization
BN（Batch Normalization）和TF2的BN层
Normalization
对Normalization的一点研究
CS231n Group Normalization (分组归一
数据增强
Lecture 10 | Recurrent Neural Ne
深度学习中 Batch Normalization为什么效果好？
每日一问之 Batch Normalization

Batch Normalization 会使你的参数搜索问题变得很容易，使神经网络对超参数的选择更加稳定，超参数的范围会更加庞大，工作效果也很好，也会使你的训练更加容易，甚至是深层网络。

当训练一个模型，比如logistic回归时，你也许会记得，归一化输入特征可以加快学习过程。你计算了平均值，从训练集中减去平均值，计算了方差，接着根据方差归一化你的数据集，在之前的视频中我们看到，这是如何把学习问题的轮廓，从很长的东西，变成更圆的东西，更易于算法优化。所以对logistic回归和神经网络的归一化输入特征值而言这是有效的。
那么更深的模型呢？你不仅输入了特征值x，而且这层有激活值a^[1]，这层有激活值a^[2]等等。如果你想训练这些参数，比如w^[3]，b^[3]，那归一化a^[2]的平均值和方差岂不是很好？以便使w^[3]，b^[3]的训练更有效率。
在神经网络中，已知一些中间值，假设你有一些隐藏单元值，从Z⁽¹⁾到Z^(m)，这些来源于隐藏层，所以这样写会更准确，即z为隐藏层，i从 1到m。

在这里，我们分别介绍和使用来自tf.layers高级封装函数tf.layers.batch_normalization和低级的tf.nn中的tf.nn.batch_normalization

怎么加入batch normalization

我们又分为两种情况讨论：

全连接层
卷积层

使用tf.layers.batch_normalization

首先讨论全连接层，分为4个步骤：

加入 is_training 参数
从全连接层中移除激活函数和bias
使用tf.layers.batch_normalization函数归一化层的输出
-传递归一化后的值给激活函数

def fully_connected(prev_layer, num_units, is_training):
    """
    Create a fully connectd layer with the given layer as input and the given number of neurons.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param num_units: int
        The size of the layer. That is, the number of units, nodes, or neurons.
    :param is_training: bool or Tensor
        Indicates whether or not the network is currently training, which tells the batch normalization
        layer whether or not it should update or use its population statistics.
    :returns Tensor
        A new fully connected layer
    """
    layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
    layer = tf.layers.batch_normalization(layer, training=is_training)
    layer = tf.nn.relu(layer)
    return layer

然后是卷积层加入batch normalization

加入 is_training 参数
从全连接层中移除激活函数和bias
使用tf.layers.batch_normalization函数归一化层的输出
传递归一化后的值给激活函数

比较两者的区别，当你使用tf.layers时，对全连接层和卷积层时基本没有区别，使用tf.nn的时候，会有一些不同。
一般来说，人们同意消除层的bias(因为批处理已经有了扩展和转换)，并在层的非线性激活函数之前添加batch normalization。然而，对一些网络来说，使用其他方式也能很好工作。

在train方面，需要修改：

添加is_training ,一个占位符储存布尔量，表示网络是否在训练。
传递is_training给卷积层和全连接层
每次调用session.run(),都要给feed_dict传递合适的值
将train_opt放入tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):下

使用tf.nn.batch_normalization

加入 is_training 参数
去除bias 以及激活函数
添加 gamma,beta,pop_mean，pop_variance变量
使用 tf.cond处理训练与测试的不同
tf.nn.moments计算均值和方差。with tf.control_dependencies... 更新population statistics,tf.nn.batch_normalization 归一化层的输出
在测试时，用tf.nn.batch_normalization归一化层的输出，使用训练时候的population statistics
-加入激活函数

def fully_connected(prev_layer, num_units, is_training):
    """
    Create a fully connectd layer with the given layer as input and the given number of neurons.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param num_units: int
        The size of the layer. That is, the number of units, nodes, or neurons.
    :param is_training: bool or Tensor
        Indicates whether or not the network is currently training, which tells the batch normalization
        layer whether or not it should update or use its population statistics.
    :returns Tensor
        A new fully connected layer
    """

    layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)

    gamma = tf.Variable(tf.ones([num_units]))
    beta = tf.Variable(tf.zeros([num_units]))

    pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False)
    pop_variance = tf.Variable(tf.ones([num_units]), trainable=False)

    epsilon = 1e-3
    
    def batch_norm_training():
        batch_mean, batch_variance = tf.nn.moments(layer, [0])

        decay = 0.99
        train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
        train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay))

        with tf.control_dependencies([train_mean, train_variance]):
            return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
 
    def batch_norm_inference():
        return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)

    batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
    return tf.nn.relu(batch_normalized_output)

def conv_layer(prev_layer, layer_depth, is_training):
    """
    Create a convolutional layer with the given layer as input.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param layer_depth: int
        We'll set the strides and number of feature maps based on the layer's depth in the network.
        This is *not* a good way to make a CNN, but it helps us create this example with very little code.
    :param is_training: bool or Tensor
        Indicates whether or not the network is currently training, which tells the batch normalization
        layer whether or not it should update or use its population statistics.
    :returns Tensor
        A new convolutional layer
    """
    strides = 2 if layer_depth % 3 == 0 else 1
    
    in_channels = prev_layer.get_shape().as_list()[3]
    out_channels = layer_depth*4
    
    weights = tf.Variable(
        tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.05))
    
    layer = tf.nn.conv2d(prev_layer, weights, strides=[1,strides, strides, 1], padding='SAME')

    gamma = tf.Variable(tf.ones([out_channels]))
    beta = tf.Variable(tf.zeros([out_channels]))

    pop_mean = tf.Variable(tf.zeros([out_channels]), trainable=False)
    pop_variance = tf.Variable(tf.ones([out_channels]), trainable=False)

    epsilon = 1e-3
    
    def batch_norm_training():
        # Important to use the correct dimensions here to ensure the mean and variance are calculated 
        # per feature map instead of for the entire layer
        batch_mean, batch_variance = tf.nn.moments(layer, [0,1,2], keep_dims=False)

        decay = 0.99
        train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
        train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay))

        with tf.control_dependencies([train_mean, train_variance]):
            return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
 
    def batch_norm_inference():
        return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)

    batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
    return tf.nn.relu(batch_normalized_output)

我们不用添加with tf.control_dependencies... ，因为我们手动更新了populayions statistics 在全连接层和卷积层

def train(num_batches, batch_size, learning_rate):
    # Build placeholders for the input samples and labels 
    inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
    labels = tf.placeholder(tf.float32, [None, 10])

    # Add placeholder to indicate whether or not we're training the model
    is_training = tf.placeholder(tf.bool)

    # Feed the inputs into a series of 20 convolutional layers 
    layer = inputs
    for layer_i in range(1, 20):
        layer = conv_layer(layer, layer_i, is_training)

    # Flatten the output from the convolutional layers 
    orig_shape = layer.get_shape().as_list()
    layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])

    # Add one fully connected layer
    layer = fully_connected(layer, 100, is_training)

    # Create the output layer with 1 node for each 
    logits = tf.layers.dense(layer, 10)
    
    # Define loss and training operations
    model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
    train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
    
    # Create operations to test accuracy
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    # Train and test the network
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for batch_i in range(num_batches):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)

            # train this batch
            sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
            
            # Periodically check the validation or training loss and accuracy
            if batch_i % 100 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
                                                              labels: mnist.validation.labels,
                                                              is_training: False})
                print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
            elif batch_i % 25 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})
                print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))

        # At the end, score the final accuracy for both the validation and test sets
        acc = sess.run(accuracy, {inputs: mnist.validation.images,
                                  labels: mnist.validation.labels, 
                                  is_training: False})
        print('Final validation accuracy: {:>3.5f}'.format(acc))
        acc = sess.run(accuracy, {inputs: mnist.test.images,
                                  labels: mnist.test.labels,
                                  is_training: False})
        print('Final test accuracy: {:>3.5f}'.format(acc))
        
        # Score the first 100 test images individually, just to make sure batch normalization really worked
        correct = 0
        for i in range(100):
            correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
                                                    labels: [mnist.test.labels[i]],
                                                    is_training: False})

        print("Accuracy on 100 samples:", correct/100)


num_batches = 800
batch_size = 64
learning_rate = 0.002

tf.reset_default_graph()
with tf.Graph().as_default():
    train(num_batches, batch_size, learning_rate)

batch_normalization
normalization 批标准化(batch normalization)理解tensorflow中batch...
Normalization
BN (Batch Normalization) 1.深度学习中的Batch Normalization 2.Ba...
BN（Batch Normalization）和TF2的BN层
1、Batch Normalization 在讨论Batch Normalization之前，先讨论一下featu...
Normalization
Batch Normalization对一个batch的每个样本计算均值方差，然后进行normalization ...
对Normalization的一点研究
Batch Normalization 对每个batch中同一维特征做normalization。可以调用s...
CS231n Group Normalization (分组归一
继Batch Normalization，Layer Normalization后又整出了分组归一化（Group ...
数据增强
《Batch Normalization: Accelerating Deep Network Training ...
Lecture 10 | Recurrent Neural Ne
2014, before batch normalization was invented, training ...
深度学习中 Batch Normalization为什么效果好？
《Batch Normalization Accelerating Deep Network Training b...
每日一问之 Batch Normalization
Batch Normalization 的好处是什么？这么久了，早就忘记了 Batch Normalizatio...

Batch Normalization

怎么加入batch normalization

使用tf.layers.batch_normalization

使用tf.nn.batch_normalization

相关文章

batch_normalization

Normalization

BN（Batch Normalization）和TF2的BN层

Normalization

对Normalization的一点研究

CS231n Group Normalization (分组归一

数据增强

Lecture 10 | Recurrent Neural Ne

深度学习中 Batch Normalization为什么效果好？

每日一问之 Batch Normalization

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

深度学习-推荐系统-CV-NLP

深度学习·神经网络·计算机视觉