CNN基础+CIFAR-10图片分类

作者: 劲草浅躬行 | 来源:发表于2018-09-11 23:30 被阅读0次

16- 深度学习之神经网络核心原理与算法-caffe&k
CNN基础+CIFAR-10图片分类
15- 深度学习之神经网络核心原理与算法-多gpu实现CNN图片
2.CNN图片多标签分类（基于TensorFlow实现验证码识别
tensorflow学习笔记-cifar10图像分类示例
1216周三：补充9、模型层的基础算法CNN\RNN
复现一个小网络--CIFAR-10分类
R-CNN系列其二：SPP-Net
Keras 练习3 - 图片分类
Object Detection: From A to Z

一：CNN基础

深度学习的热潮正是由2012年AlexNet的出现而引发的，因此，学习AlexNet网络的结构，对于CNN的学习与理解是不可或缺的。
[图片上传失败...(image-8dca66-1536679715804)]

1. 卷积

如果我们使用传统神经网络方式，对一张图片进行分类，那么，我们把图片的每个像素都连接到隐藏层节点上，那么对于一张1000x1000像素的图片，如果我们有1M隐藏层单元，那么一共有10^12个参数，这显然是不能接受的。（如下图所示）

image

2. 激励函数

Alexnet中使用relu作为激活函数，表达式为max(0, x)。在此之前，神经网络的激活函数通常是sigmoid或者tanh函数，这两种函数最大的缺点就是其饱和性，当输入的x过大或过小时，函数的输出会非常接近+1与-1，在这里斜率会非常小，那么在训练时引用梯度下降时，其饱和性会使梯度非常小，严重降低了网络的训练速度。
激励层的实践经验：

不要用sigmoid！不要用sigmoid！不要用sigmoid！
首先试RELU，因为快，但要小心点
如果2失效，请用Leaky ReLU
某些情况下tanh倒是有不错的结果，但是很少

3. 池化

image
池化层是CNN中非常重要的一层，可以起到提取主要特征，减少特征图尺寸的作用，对加速CNN计算非常重要，主要分为最大池化，均值池化。

特征不变性，图像压缩时去掉的信息只是一些无关紧要的信息，而留下的信息则是具有尺度不变性的特征，是最能表达图像的特征。
降低计算，我们知道一幅图像含有的信息是很大的，特征也很多，但是有些信息对于我们做图像任务时没有太多用途或者有重复，我们可以把这类冗余信息去除，把最重要的特征抽取出来，这也是池化操作的一大作用。
在一定程度上防止过拟合，更方便优化。

4. 全连接

[图片上传失败...(image-51b67-1536679715804)]

在最后一层卷积结束后，进行了最后一次池化，输出了20个1212的图像，然后通过了一个全连接层变成了1100的向量。

这是怎么做到的呢，其实就是有20100个1212的卷积核卷积出来的，对于输入的每一张图，用了一个和图像一样大小的核卷积，这样整幅图就变成了一个数了，如果厚度是20就是那20个核卷积完了之后相加求和。这样就能把一张图高度浓缩成一个数了。
二：CIFAR-10图片分类

CIFAR图片分类项目地址

概述

对CIFAR-10 数据集的分类是机器学习中一个公开的基准测试问题，其任务是对一组32x32RGB的图像进行分类，这些图像涵盖了10个类别：
飞机，汽车，鸟，猫，鹿，狗，青蛙，马，船以及卡车。

image

模型结构

本教程中的模型是一个多层架构，由卷积层和非线性层(nonlinearities)交替多次排列后构成。这些层最终通过全连通层对接到softmax分类器上。这一模型除了最顶部的几层外，基本跟Alex Krizhevsky提出的模型一致。

在一个GPU上经过几个小时的训练后，该模型达到了最高86%的精度。细节请查看下面的描述以及代码。模型中包含了1,068,298个学习参数，分类一副图像需要大概19.5M个乘加操作。

[图片上传失败...(image-1609-1536679715804)]

模型包括：

CIFAR-10 网络模型部分的代码位于 cifar10.py. 完整的训练图中包含约765个操作。但是我们发现通过下面的模块来构造训练图可以最大限度的提高代码复用率:

模型输入: 包括inputs() 、 distorted_inputs()等一些操作，分别用于读取CIFAR的图像并进行预处理，做为后续评估和训练的输入；
模型预测: 包括inference()等一些操作，用于进行统计计算，比如在提供的图像进行分类； adds operations that perform inference, i.e. classification, on supplied images.
模型训练: 包括loss() and train()等一些操作，用于计算损失、计算梯度、进行变量更新以及呈现最终结果。

# the input
images=tf.placeholder(tf.float32,[None,24,24,3])

# build up the network.
logits = cifar10.network(images)

# Calculate loss.
loss = cifar10.loss(logits, labels)

# Caluculate grad and back propagation
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

# initialize
sess=tf.Session()
sess.run(tf.global_variables_initializer())

saver=tf.train.Saver()

images = cifar10.distorted_inputs()

# iterate 2000 times
for i in range(2000):
    # shuffle the images index
    np.random.shuffle(images)
    # one epoch
    for j in int(np.shape(images)[0]/batch_size):
        input_batch=images[j:j+batch_size,:,:,:]
        sess.run([loss,train_op],feed_dict={images:input_batch})

saver.save(sess,resutl_dir+'model.ckpt')

def network(images):
  """Build the CIFAR-10 model.
  Args:
    images: Images returned from distorted_inputs() or inputs().
  Returns:
    Logits.
  """
  # We instantiate all variables using tf.get_variable() instead of
  # tf.Variable() in order to share variables across multiple GPU training runs.
  # If we only ran this model on a single GPU, we could simplify this function
  # by replacing all instances of tf.get_variable() with tf.Variable().
  #
  # conv1
  with tf.variable_scope('conv1') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[5, 5, 3, 64],
                                         stddev=5e-2,
                                         wd=None)
    conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(pre_activation, name=scope.name)
    _activation_summary(conv1)

  # pool1
  pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                         padding='SAME', name='pool1')
  # norm1 基本不再使用
  norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm1')

  # conv2
  with tf.variable_scope('conv2') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[5, 5, 64, 64],
                                         stddev=5e-2,
                                         wd=None)
    conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(pre_activation, name=scope.name)
    _activation_summary(conv2)

  # norm2
  norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm2')
  # pool2
  pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],
                         strides=[1, 2, 2, 1], padding='SAME', name='pool2')

  # local3
  with tf.variable_scope('local3') as scope:
    # Move everything into depth so we can perform a single matrix multiply.
    reshape = tf.reshape(pool2, [images.get_shape().as_list()[0], -1])
    dim = reshape.get_shape()[1].value
    weights = _variable_with_weight_decay('weights', shape=[dim, 384],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
    local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
    _activation_summary(local3)

  # local4
  with tf.variable_scope('local4') as scope:
    weights = _variable_with_weight_decay('weights', shape=[384, 192],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
    local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
    _activation_summary(local4)

  # linear layer(WX + b),
  # We don't apply softmax here because
  # tf.nn.sparse_softmax_cross_entropy_with_logits accepts the unscaled logits
  # and performs the softmax internally for efficiency.
  with tf.variable_scope('softmax_linear') as scope:
    weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
                                          stddev=1/192.0, wd=None)
    biases = _variable_on_cpu('biases', [NUM_CLASSES],
                              tf.constant_initializer(0.0))
    softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
    _activation_summary(softmax_linear)

  return softmax_linear

损失曲线

image

16- 深度学习之神经网络核心原理与算法-caffe&k
之前我们在使用cnn做图片分类的时候使用了CIFAR-10数据集其他框架对于CIFAR-10的图片分类是怎么做的...
CNN基础+CIFAR-10图片分类
一：CNN基础深度学习的热潮正是由2012年AlexNet的出现而引发的，因此，学习AlexNet网络的结构，对...
15- 深度学习之神经网络核心原理与算法-多gpu实现CNN图片
使用TensorFlow中的卷积神经网络CNN对于图片进行分类。简介 CIFAR-10 每张图片: (32,32...
2.CNN图片多标签分类（基于TensorFlow实现验证码识别
上一篇实现了图片CNN单标签分类（猫狗图片分类任务）（地址：https://www.jianshu.com/p/4...
tensorflow学习笔记-cifar10图像分类示例
这篇笔记主要记录一下学习tensorflow cifar-10图像分类的示例代码。数据介绍 Cifar-10是由...
1216周三：补充9、模型层的基础算法CNN\RNN
补充基础算法CNN\RNN 1、CNN 基础的CNN由卷积(convolution),激活(activation)...
复现一个小网络--CIFAR-10分类
dataset介绍： CIFAR-10数据介绍 CIFAR-10^3是一个常用的彩色图片数据集，它有10个类别: ...
R-CNN系列其二：SPP-Net
介绍我们平时用于做物体分类或检测的CNN网络多是由两部分来组成，前端的CNN层构成的网络用于图片特征的抽象化提取...
Keras 练习3 - 图片分类
建立CNN网络，对10类图片进行分类判断。输入：图片大小为32，彩色的，数据为32x32x3，label 10类，...
Object Detection: From A to Z
RCNN 1. Rcnn的Motivation是什么？目标检测进展缓慢，CNN在图片分类中取得重大成功。应用...