美文网首页
01-svhn note

01-svhn note

作者: 孙培钦bone | 来源:发表于2019-03-13 15:38 被阅读0次

    configure

    用于配置工程的超参数, 对应到01-svhn工程下的common.py.

    minibatch_size = 128  # the number of instances in a batch
    nr_channel = 3 # the channels of image
    image_shape = (32, 32) # the image shape (height, width)
    nr_class = 10 # the number of classes
    nr_epoch = 60 # the max epoch of training
    weight_decay = 1e-10 # a strength of regularization
    test_interval = 5 # test in every ${test_interval} epochs
    show_interval = 10 # print a message of training in every ${show_interval} minibatchs 
    

    nr_class根据分类任务的类别数进行指定,
    weight_decay指代的是 正则化部分对整个loss的影响程度

    nr_class = 10 # the number of classes
    weight_decay = 1e-10 # a strength of regularization
    

    用于决定print的频率, 单位都是epoch

    test_interval = 5 # test in every ${test_interval} epochs
    show_interval = 10 # print a message of training in every ${show_interval} minibatchs 
    

    Dataset

    由于deep learning是data-driven, 所以数据的处理是整个工程当中最重要的部分。01-svhn工程专门用一个dataset.py中的Dataset()进行处理。

    # import required modules
    import os
    import cv2
    import numpy as np
    from scipy import io as scio
    
    class Dataset():
        dataset_path = '../../dataset/SVHN'  # a path saves dataset
        dataset_meta = {
            'train': ([os.path.join(dataset_path, 'train_32x32.mat')], 73257),
            'test': ([os.path.join(dataset_path, 'test_32x32.mat')], 26032),
        }
        
        def __init__(self, dataset_name):
            self.files, self.instances = self.dataset_meta[dataset_name]
        
        def load(self):
            '''Load dataset metas from files'''
            datas_list, labels_list = [], []
            for f in self.files:
                samples = scio.loadmat(f)
                datas_list.append(samples['X'])
                labels_list.append(samples['y'])
            self.samples = {
                'X': np.concatenate(datas_list, axis=3), # datas
                'Y': np.concatenate(labels_list, axis=0), # labels
            }
            return self
    
        def instance_generator(self):
            '''a generator to yield a sample'''
            for i in range(self.instances):
                img = self.samples['X'][:, :, :, i]
                label = self.samples['Y'][i, :][0]
                if label == 10:
                    label = 0
                img = cv2.resize(img, image_shape)
                yield img.astype(np.float32), np.array(label, dtype=np.int32)
        
        @property
        def instances_per_epoch(self):
            return 25600 # set for a fast experiment
            #return self.instances
        
        @property
        def minibatchs_per_epoch(self):
            return 200 # set for a fast experimetn
            #return self.instances // minibatch_size
    

    Code explained

    根据数据的保存路径来修改dataset_path

    dataset_path = '../../dataset/SVHN'  # a path saves dataset
    

    维护数据的各个子集, 包括它们的文件位置及数据量

     dataset_meta = {
            'train': ([os.path.join(dataset_path, 'train_32x32.mat')], 73257),
            'test': ([os.path.join(dataset_path, 'test_32x32.mat')], 26032),
        }
    

    load函数作用一般就是解压缩, 将数据集的数据load到内存, 便于后续访问。常用的也有把所有数据(eg. 图片)的访问路径读取成一个list, 便于后面读取。

        def load(self):
            '''Load dataset metas from files'''
            datas_list, labels_list = [], []
            for f in self.files:
                samples = scio.loadmat(f)
                ....
    

    利用python的生成器来构造一个sample, 用于后续配合tensorflow的dataset API进行使用。sample = (data, label)
    Note: 由于svhn数据集把0的label标记成数字10, 而tensorflow的默认实现分类任务的label都是从0开始(svhn是matlab格式, 从1作为下标开始),所以要强制更改一下。

                if label == 10:
                    label = 0
    

    Show

    在编写完dataset.py的代码之后, 可以编写一个简单的测试函数来检验书写的正确性, 同时也可以看看数据集的图片内容。后续如果大家开始对数据进行augment之后, 这种方式还可以用来检验augment之后的数据情况。

    # show an img from dataset
    %matplotlib inline
    import matplotlib.pyplot as plt
    
    ds = Dataset('train').load()
    ds_gen = ds.instance_generator()
    
    imggrid = []
    for i in range(25):
        img, label = next(ds_gen) # yield a sample
        cv2.putText(img, str(label), (0, image_shape[0]), cv2.FONT_HERSHEY_SIMPLEX, 0.6, 
                    (0, 255, 0), 2) # put a label on img
        imggrid.append(img) 
    
    # make an img grid from an img list
    imggrid = np.array(imggrid).reshape((5, 5, img.shape[0], img.shape[1], img.shape[2]))
    imggrid = imggrid.transpose((0, 2, 1, 3, 4)).reshape((5*img.shape[0], 5*img.shape[1], 3)) 
    imggrid = cv2.cvtColor(imggrid.astype('uint8'), cv2.COLOR_BGR2RGB)
    
    # show
    plt.figure()
    plt.imshow(imggrid)
    plt.show()
    

    为了在jupyter notebook可以使用matplotlib在线实时查看生成的图片.

    %matplotlib inline
    import matplotlib.pyplot as plt
    

    用于将多张子图片整合成一张大图片

    imggrid = np.array(imggrid).reshape((5, 5, img.shape[0], img.shape[1], img.shape[2]))
    imggrid = imggrid.transpose((0, 2, 1, 3, 4)).reshape((5*img.shape[0], 5*img.shape[1], 3)) 
    

    由于openCV采用的是BGR格式, 而matplotlib采用的是RGB格式, 所以在利用matplotlib进行显示的时候需进行一次转换。

    imggrid = cv2.cvtColor(imggrid.astype('uint8'), cv2.COLOR_BGR2RGB)
    

    Build a graph

    tensorflow属于符号式编程, 其需要先定义一个compute graph, graph描述了各个tensor之间的相互关系以及operators, 然后再将定义好的graph进行编译, 利用session进行run. 在run的过程当中, graph不会发生改变。所以, 大家一定要形成一个意识: graph的定义和计算是分隔开的。下面, 先关注how to build a graph。为了后续和我们提供的检查工具tf-model-manip.py进行配合, 强烈推荐大家按照01-svhn的工程架构进行后续代码的书写。即在工程代码中创建一个model.py文件, 内含一个Model class, 在Model class中有一个method: build.


    在__init__函数中指定param使用的初始化方式(这里使用MSRA初始化方式, 感兴趣可以查看 https://arxiv.org/pdf/1502.01852.pdf)。简单而言, 就是一个均值为0的正态分布, 其方差由输入神经元个数决定。也可以通过修改mode来进行修改方差的计算方式。

    def __init__(self):
            self.weight_init = tf_contrib.layers.variance_scaling_initializer(factor=1.0,
                                    mode='FAN_IN', uniform=False)
            self.bias_init = tf.zeros_initializer()
            self.reg = tf_contrib.layers.l2_regularizer(weight_decay)
    

    三种mode(详细资料: tensorflow-init )

      if mode='FAN_IN': # Count only number of input connections.
        n = fan_in
      elif mode='FAN_OUT': # Count only number of output connections.
        n = fan_out
      elif mode='FAN_AVG': # Average number of inputs and output connections.
        n = (fan_in + fan_out)/2.0
    

    _conv_layer实现是一个默认带 BN和relu 的 conv2d.

       def _conv_layer(self, name, inp, kernel_shape, stride, padding='SAME',is_training=False):
            '''a conv layer = conv + bn + relu'''
            
            with tf.variable_scope(name) as scope:
                conv_filter = tf.get_variable(name='filter', shape=kernel_shape,
                                              initializer=self.weight_init, regularizer=self.reg)
                conv_bias = tf.get_variable(name='bias', shape=kernel_shape[-1],
                                            initializer=self.bias_init)
                x = tf.nn.conv2d(inp, conv_filter, strides=[1, stride, stride, 1],
                                 padding=padding, data_format='NHWC')
                x = tf.nn.bias_add(x, conv_bias, data_format='NHWC')
                x = tf.layers.batch_normalization(x, axis=3, training=is_training)
                x = tf.nn.relu(x)
            return x
    

    通过利用variable_scope形成命名空间, 实现命名的复用。 e.g. name='conv1', 则conv_filter的最后名称会变为: 'conv1/filter'

            with tf.variable_scope(name) as scope:
                conv_filter = tf.get_variable(name='filter', ....)
    

    在tensorflow当中, conv2d是一个operation, 必须注意的是参数strides与data_format的搭配

    x = tf.nn.conv2d(inp, conv_filter, strides=[1, stride, stride, 1], data_format='NHWC')
    x = tf.nn.conv2d(inp, conv_filter, strides=[1, 1, stride, stride], data_format='NCHW')
    

    BN操作同样也要注意与 data_format 的配合, 因为在 CNN 当中BN是针对channel进行跨batch操作。所以, 要显式指定 channel 维度。

    x = tf.layers.batch_normalization(x, axis=3, training=is_training) # NHWC
    x = tf.layers.batch_normalization(x, axis=1, training=is_training) # NCHW
    

    BN另一个需要注意的点是: 它的运行方式是有分 train 和 inference 两个状态的。也就是说在train阶段, BN的统计数(mean, variance)会发生变化, 而inference阶段会保持统计数不变, 取得是train阶段的统计数的moving average。

    x = tf.layers.batch_normalization(x, axis=3, training=True) # train
    x = tf.layers.batch_normalization(x, axis=3, training=False) # inference
    

    目前_pool_layer只支持两种方式: Max, Avg. 后续还会有需要的情况下会继续增加操作。

        def _pool_layer(self, name, inp, ksize, stride, padding='SAME', mode='MAX'):
            '''a pool layer which only supports avg_pooling and max_pooling(default)'''
            
            assert mode in ['MAX', 'AVG'], 'the mode of pool must be MAX or AVG'
            if mode == 'MAX':
                x = tf.nn.max_pool(inp, ksize=[1, ksize, ksize, 1], strides=[1, stride, stride, 1],
                                   padding=padding, name=name, data_format='NHWC')
            elif mode == 'AVG':
                x = tf.nn.avg_pool(inp, ksize=[1, ksize, ksize, 1], strides=[1, stride, stride, 1],
                                   padding=padding, name=name, data_format='NHWC')
            return x
    

    在使用pooling的时候, 同样也要注意与 data_format 的配合。

     x = tf.nn.max_pool(inp, ksize=[1, ksize, ksize, 1], strides=[1, stride, stride, 1], data_format='NHWC') 
     x = tf.nn.max_pool(inp, ksize=[1, 1, ksize, ksize], strides=[1, 1, stride, stride], data_format='NCHW') 
    

    由于fc_layer一般是接在 conv or pool layer后面, 所以会先做拉直操作(flatten)。

        def _fc_layer(self, name, inp, units, dropout=0.5):
            '''a full connect layer'''
            
            with tf.variable_scope(name) as scope:
                shape = inp.get_shape().as_list() # get the shape of input
                dim = 1
                for d in shape[1:]:
                    dim *= d
                x = tf.reshape(inp, [-1, dim]) # flatten
                if dropout > 0: # if with dropout
                    x = tf.nn.dropout(x, keep_prob=dropout, name='dropout')
                x = tf.layers.dense(x, units, kernel_initializer=self.weight_init,
                                    bias_initializer=self.bias_init, kernel_regularizer=self.reg)
            return x
    

    build函数利用之前的基础模块: conv, pool, fc 进行组合, 定义graph

      def bulid(self):
            # set inputs
            data = tf.placeholder(tf.float32, shape=(None,)+image_shape+(nr_channel,),
                                  name='data')
            label = tf.placeholder(tf.int32, shape=(None,), name='label')
            label_onehot = tf.one_hot(label, nr_class, dtype=tf.int32) 
            is_training = tf.placeholder(tf.bool, name='is_training') # a flag of bn
            
            # conv1
            x = self._conv_layer(name='conv1', inp=data,
                                 kernel_shape=[3, 3, nr_channel, 16], stride=1,
                                 is_training=is_training) # Nx32x32x32
            x = self._pool_layer(name='pool1', inp=x, ksize=2, stride=2, mode='MAX') # Nx16x16x16
    
            # conv2
            x = self._conv_layer(name='conv2a', inp=x, kernel_shape=[3, 3, 16, 32],
                                 stride=1, is_training=is_training)
            x = self._conv_layer(name='conv2b', inp=x, kernel_shape=[3, 3, 32, 32],
                                 stride=1, is_training=is_training)
            x = self._pool_layer(name='pool2', inp=x, ksize=2, stride=2, mode='MAX') # Nx8x8x32
            
            # conv3
            x = self._conv_layer(name='conv3a', inp=x, kernel_shape=[3, 3, 32, 64],
                                 stride=1, is_training=is_training)
            x = self._conv_layer(name='conv3b', inp=x, kernel_shape=[3, 3, 64, 64],
                                 stride=1, is_training=is_training)
            x = self._pool_layer(name='pool3', inp=x, ksize=2, stride=2, mode='MAX') # Nx4x4x64
    
            # conv4
            x = self._conv_layer(name='conv4a', inp=x, kernel_shape=[3, 3, 64, 128],
                                 stride=1, is_training=is_training)
            x = self._conv_layer(name='conv4b', inp=x, kernel_shape=[3, 3, 128, 128],
                                 stride=1, is_training=is_training)
            x = self._pool_layer(name='pool4', inp=x, ksize=4, stride=4, mode='AVG') # Nx1x1x128
            
            # fc
            logits = self._fc_layer(name='fc1', inp=x, units=nr_class, dropout=0)
            
            # softmax
            preds = tf.nn.softmax(logits)
            
            placeholders = {
                'data': data,
                'label': label,
                'is_training': is_training,
            }
            return placeholders, label_onehot, logits, preds
    

    由于graph在run的时候与外界是隔离的, 所以必须要通过某种接口与外界的数据进行交互。这种接口就是: placeholder.
    在shape参数指定None, 表示可以接受任意数值。

    data = tf.placeholder(tf.float32, shape=(None,)+image_shape+(nr_channel,), name='data')
    label = tf.placeholder(tf.int32, shape=(None,), name='label')
    

    Training

    在定义完graph之后, 就可以开始raining。对应于01-svhn工程中的train.py


    tensorflow在数据供给方面提供了几种方式。01-svhn工程中采用利用generator进行配合的方式。

    def get_dataset_batch(ds_name):
        '''get a batch generator of dataset'''
        dataset = Dataset(ds_name)
        ds_gnr = dataset.load().instance_generator
        ds = tf.data.Dataset().from_generator(ds_gnr, output_types=(tf.float32, tf.int32),)
        if ds_name == 'train':
            ds = ds.shuffle(dataset.instances_per_epoch)
            ds = ds.repeat(nr_epoch)
        elif ds_name == 'test':
            ds = ds.repeat(nr_epoch // test_interval)
        ds = ds.batch(minibatch_size, drop_remainder=True)
        ds_iter = ds.make_one_shot_iterator()
        sample_gnr = ds_iter.get_next()
        return sample_gnr, dataset
    

    为了加速训练, tensorflow在shuffle时会要求指定数据量, 01-svhn默认指定一个epoch的数据量。所以, 有些同学会发现使用extra_32x32.mat之后会出现内存爆炸的问题。因此, 大家可以通过指定较小的数据量来avoid.

    ds = ds.shuffle(dataset.instances_per_epoch)
    

    用于指定数据集的遍历次数, 一般与训练的epoch数保持一致。

    ds = ds.repeat(nr_epoch)
    

    获取一个batch生成器, 即每次都会生成数据的batch形式. drop_remainder用于指定: 若当最后一次batch数据量 < minibatch_size, 若true由丢弃不用, 默认是使用小batch的数据。

    ds = ds.batch(minibatch_size, drop_remainder=True)
    ds_iter = ds.make_one_shot_iterator()
    sample_gnr = ds_iter.get_next()
    

    # load datasets
    train_batch_gnr, train_set = get_dataset_batch(ds_name='train')
    test_batch_gnr, test_set = get_dataset_batch(ds_name='test')
    
    # build a compute graph
    network = Model() 
    placeholders, label_onehot, logits, preds = network.bulid()
    loss_reg = tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))
    loss = tf.losses.softmax_cross_entropy(label_onehot, logits) + loss_reg
    
    # set a performance metric
    correct_pred = tf.equal(tf.cast(tf.argmax(preds, 1), dtype=tf.int32),
                            tf.cast(tf.argmax(label_onehot, 1), dtype=tf.int32))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
    
    # learn rate config
    global_steps = tf.Variable(0, trainable=False) # a cnt to record the num of minibatchs
    boundaries = [train_set.minibatchs_per_epoch*15, train_set.minibatchs_per_epoch*40] 
    values = [0.01, 0.001, 0.0005]
    lr = tf.train.piecewise_constant(global_steps, boundaries, values)
    
    opt = tf.train.AdamOptimizer(lr) # use adam as optimizer
    
    # in order to update BN in every iter, a trick in tf
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        train = opt.minimize(loss)
    

    tf.get_collection用于遍历图中的变量, 将具有共性的变量收集起来形成一个list。由于有正则化的变量会带有REGULARIZATION_LOSSES的key, 所以将这些变量值进行相加即得到正则化项的loss.

    loss_reg = tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))
    

    由于tensorflow没有直接实现的cross_entropy函数, 而是将softmax和cross_entropy直接集合成一个函数, 所以这里给定的参数必须是logits, 而不是preds.

    tf.losses.softmax_cross_entropy(label_onehot, logits)
    

    将accuracy作为对模型性能的评测标准。

    correct_pred = tf.equal(tf.cast(tf.argmax(preds, 1), dtype=tf.int32),
                            tf.cast(tf.argmax(label_onehot, 1), dtype=tf.int32))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
    

    learning rate是训练NN最重要的超参数, 在01-svhn当中采用阶梯退火的方式来更改学习率。boundaries指定变化的minibatchs点, values指定数值。

    global_steps = tf.Variable(0, trainable=False) # a cnt to record the num of minibatchs
    boundaries = [train_set.minibatchs_per_epoch*15, train_set.minibatchs_per_epoch*40] 
    values = [0.01, 0.001, 0.0005]
    lr = tf.train.piecewise_constant(global_steps, boundaries, values)
    

    指定训练时使用的optimizer. 01-svhn工程使用adam, adam会有一些额外参数可以指定, 但必须指定的只有学习率。

    opt = tf.train.AdamOptimizer(lr) # use adam as optimizer
    

    在tensorflow中, 要正确实现BN比其他框架要更为困难。由于BN的统计量是每个minibatch都要变化, 所以必须通过tf.control_dependencies来显式控制. 它会使得每一次run都会采用update_ops对应的最新值。

    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        train = opt.minimize(loss)
    

    tensorflow利用session来控制graph的running.

    # create a session
    tf.set_random_seed(12345) # ensure consistent results
    global_cnt = 0 # a cnt to record the num of minibatchs
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer()) # init all variables
        # training
        for e in range(nr_epoch): 
            for _ in range(train_set.minibatchs_per_epoch):
                global_cnt += 1
                images, labels = sess.run(train_batch_gnr) # get a batch of (img, label)
                feed_dict = { # assign data to placeholders respectively
                        placeholders['data']: images,
                        placeholders['label']: labels,
                        global_steps: global_cnt,
                        placeholders['is_training']: True, # in training phase, set True
                }
                # run train, and output all values you want to monitor
                _, loss_v, acc_v, lr_v = sess.run([train, loss, accuracy, lr], 
                                                  feed_dict=feed_dict)
                
                if global_cnt % show_interval == 0:
                    print(
                        "e:{},{}/{}".format(e, global_cnt % train_set.minibatchs_per_epoch,
                                            train_set.minibatchs_per_epoch),
                        'loss: {:.3f}'.format(loss_v),
                        'acc: {:.3f}'.format(acc_v),
                        'lr: {:.3f}'.format(lr_v),
                    )
    
        # validation
        if epoch % test_interval == 0:
            loss_sum, acc_sum = 0, 0 # init
            for i in range(test_set.minibatchs_per_epoch):
                images, labels = sess.run(test_batch_gnr)
                feed_dict = {
                        placeholders['data']: images,
                        placeholders['label']: labels,
                        global_steps: global_cnt,
                        placeholders['is_training']: False, # in test phase, set False
                    }
                loss_v, acc_v = sess.run([loss, accuracy], feed_dict=feed_dict)
                loss_sum += loss_v # update
                acc_sum += acc_v # update
            print("\n**************Validation results****************")
            print('loss_avg: {:.3f}'.format(loss_sum / test_set.minibatchs_per_epoch),
                  'accuracy_avg: {:.3f}'.format(acc_sum / test_set.minibatchs_per_epoch))
            print("************************************************\n")    
    
    print('Training is done, exit.')
    

    在开始进行trainning之前, 需要对graph中所有的变量进行init

    sess.run(tf.global_variables_initializer()) # init all variables
    

    运行batch_gnr可以生成batch形式的images, labels. 在这里可以看出tensorflow只有利用run才会真正运行, 之前都只是定义符号。

     images, labels = sess.run(train_batch_gnr) # get a batch of (img, label)
    

    通过在list当中指定想run的目标。在train阶段, 必须要运行优化函数, 才能实现对loss的优化。所以, 问: 如何想实时获取learning rate的数值, 该怎么办?

    train = opt.minimize(loss)
      _, loss_v, acc_v = sess.run([train, loss, accuracy], feed_dict=feed_dict)
    

    相关文章

      网友评论

          本文标题:01-svhn note

          本文链接:https://www.haomeiwen.com/subject/ssbzpqtx.html