深度学习训练之Batch

作者: StarsOcean | 来源:发表于2018-06-14 08:44 被阅读0次

深度学习中batch size的选择
深度学习训练之Batch
darknet yolov3配置文件参数说明
Batch Size, iteration, epoch的区别
internal covariate shift
keras图片生成器ImageDataGenerator
Normalization
深度学习框架之caffe(三) —通过NetSpec自定义网络
深度学习框架之caffe(二) —模型训练和使用
深度学习框架之caffe(四) —可视化与参数提取

一、Batch概念

什么是batch，准备了两种解释，看君喜欢哪种？

对于一个有 2000 个训练样本的数据集。将 2000 个样本分成大小为 500 的 batch，那么完成一个 epoch 需要 4 个 iteration。
如果把准备训练数据比喻成一块准备打火锅的牛肉，那么epoch就是整块牛肉，batch就是切片后的牛肉片，iteration就是涮一块牛肉片（饿了吗？）。

image.png
image.png

二、Batch用来干什么

不是给人吃，是喂给模型吃。在搭建了“模型-策略-算法”三大步之后，要开始利用数据跑（训练）这个框架，训练出最佳参数。

理想状态，就是把所有数据都喂给框架，求出最小化损失，再更新参数，重复这个过程，但是就像煮一整块牛肉那样，不知道什么时候才有得吃。----全量数据的梯度下降算法
另一个极端的状态，就是每次只给模型喂一条数据，立马就熟了，快是够快了，但是一个不小心也会直接化掉，吃都没得吃（可能无法得到局部最优）----随机梯度下降算法（stochastic gradient descent）
平衡方案，综合考虑又要快，又要有得吃，那么选用切片涮牛肉的方法，把数据切成batch大小的一块，每次（iteration）只吃一块。每次只计算一小部分数据的损失函数，并更改参数。

三、Batch的实现

再次提供两种方法

1. yield→generator

具体的语法知识，请点链接。

# --------------函数说明-----------------
# sourceData_feature ：训练集的feature部分
# sourceData_label   ：训练集的label部分
# batch_size  ： 牛肉片的厚度
# num_epochs  ： 牛肉翻煮多少次
# shuffle ： 是否打乱数据

def batch_iter(sourceData_feature,sourceData_label, batch_size, num_epochs, shuffle=True):
    
    data_size = len(sourceData_feature)
    
    num_batches_per_epoch = int(data_size / batch_size)  # 样本数/batch块大小,多出来的“尾数”，不要了
    
    for epoch in range(num_epochs):
        # Shuffle the data at each epoch
        if shuffle:
            shuffle_indices = np.random.permutation(np.arange(data_size))
            
            shuffled_data_feature = sourceData_feature[shuffle_indices]
            shuffled_data_label   = sourceData_label[shuffle_indices]
        else:
            shuffled_data_feature = sourceData_feature
            shuffled_data_label = sourceData_label

        for batch_num in range(num_batches_per_epoch):   # batch_num取值0到num_batches_per_epoch-1
            start_index = batch_num * batch_size
            end_index = min((batch_num + 1) * batch_size, data_size)

            yield (shuffled_data_feature[start_index:end_index] , shuffled_data_label[start_index:end_index])

batchSize = 100 # 定义具体的牛肉厚度
Iterations = 0  #  记录迭代的次数

# sess
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# 迭代 必须注意batch_iter是yield→generator，所以for语句有特别
for (batchInput, batchLabels) in batch_iter(mnist.train.images, mnist.train.labels, batchSize, 30, shuffle=True):
    trainingLoss = sess.run([opt,loss], feed_dict = {X: batchInput, y:batchLabels})
    if Iterations%1000 == 0:  # 每迭代一千次，输出一次效果
        train_accuracy = sess.run(accuracy, feed_dict={X:batchInput, y:batchLabels})
        print("step %d, training accuracy %g"%(Iterations,train_accuracy))
    Iterations=Iterations+1

2. slice_input_producer + batch

又涉及到一些背景知识，这篇文章和这篇文章。以下是图解slice_input_producer。

image.png

def get_batch_data(images, label, batch_Size):
    input_queue = tf.train.slice_input_producer([images, label], shuffle=True, num_epochs=20)    # 见图解
    image_batch, label_batch = tf.train.batch(input_queue, batch_size=batch_Size, num_threads=2,allow_smaller_final_batch=True)
    return image_batch,label_batch

batchSize = 100 #  记录迭代的次数

batchInput, batchLabels = get_batch_data(mnist.train.images, mnist.train.labels, batchSize)

Iterations = 0 # 定义具体的牛肉厚度

# sess
sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())#就是这一行

coord = tf.train.Coordinator()
# 真正将文件放入文件名队列，还需要调用tf.train.start_queue_runners 函数来启动执行文件名队列填充的线程，
# 之后计算单元才可以把数据读出来，否则文件名队列为空的，
threads = tf.train.start_queue_runners(sess,coord)

try:
    while not coord.should_stop():
        BatchInput,BatchLabels = sess.run([batchInput, batchLabels])
        trainingLoss = sess.run([opt,loss], feed_dict = {X:BatchInput, y:BatchLabels})
        if Iterations%1000 == 0:
            train_accuracy = accuracy.eval(session = sess, feed_dict={X:BatchInput, y:BatchLabels})
            print("step %d, training accuracy %g"%(Iterations,train_accuracy))
        Iterations = Iterations + 1
except tf.errors.OutOfRangeError:
    train_accuracy = accuracy.eval(session = sess, feed_dict={X:BatchInput, y:BatchLabels})
    print("step %d, training accuracy %g"%(Iterations,train_accuracy))
    print('Done training')
finally:
    coord.request_stop()
coord.join(threads)
# sess.close()

四、两种方式的对比

方式1： yield→generator 30个epoch
试验效果，开始前python.exe进程占了402M内存。
试验中，内存基本维持在865M左右
试验后，30个epoch耗时需要49.8s

方式2： slice_input_producer + batch
进行slice_input_producer这步，占用内存由410M提升到了583M
训练的时候，内存占用比较飘忽，有时1G多。
20个epoch耗时需要199s

小结：方式1的效率暂时比方式2快不少。

网友评论

本文标题：深度学习训练之Batch

本文链接：https://www.haomeiwen.com/subject/fqseeftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！