美文网首页
BP神经网络下MNIST字体识别

BP神经网络下MNIST字体识别

作者: Silence_Dong | 来源:发表于2018-08-14 08:36 被阅读0次

    1. BP神经网络

    • 神经网络 又称多层感知机,主要包括前馈和反向传播算法,对不同的任务,构建包含不同单元数的隐含层,融合合适的激活函数(Sigmoid、softmax、tanh,ReLu等)。对损失函数(交叉熵,最小平方差等)进行梯度下降算法(随机梯度下降、mini-batch梯度下降等)求解,尽量避开局部最小值,收敛到全局最优值。

    • 前馈 所谓前馈,就是信号从输入端,通过隐藏层的加权以及偏置,激活等,最终得到输出的过程。每一层的激活函数的输出,同时作为下一层的输入,在前馈传播中,是不对网络中的参数进行调整的

    image.png

    [图片上传失败...(image-b46ce9-1534136488945)]
    [图片上传失败...(image-eeb0c4-1534136488945)]

    在手写字体识别中,前向传播代码如下:

    def _forward_prop(self, x):
      '''
      前向传播算法
      :param x: 输入向量
      :return:
      '''
      # 输入层
      self._activations[0] = x
      # 逐层计算
      for i in range(1, self.num_layers):
          # _zs = w * x + b
          self._zs[i] = (self.weights[i].dot(
              self._activations[i - 1]) + self.biases[i])
          # 激活值
          self._activations[i] = self._activate_func[0](self._zs[i])
    
    • 后向传播 在BP神经网络中如何对权重参数进行调优是一个核心问题。在训练开始阶段,输入信号通过网络层到达输出层的结果 z,一般与实际值 y 存在偏差值 z-y,称为误差。输出层的误差通常被认为是与前面隐藏层有权重关系,即误差可以反向通过网络进行传播,得到各个隐藏层的单元误差。详细解释可参考此处

    [图片上传失败...(image-3ecf38-1534136488945)]
    [图片上传失败...(image-621a7b-1534136488945)]

    网络层中的权重,即通过各个层的误差进行更新。更新权值的思路简单粗暴,1)求梯度;2)梯度下降,梯度求导,核心是采用链式求导法则
    [图片上传失败...(image-646c10-1534136488945)]
    [图片上传失败...(image-173b36-1534136488945)]

    手写字体识别中的反向传播代码:

    def _back_prob(self, x, y):
        '''
        后向误差传播
        :param x: 输入向量
        :param y: 样本标签值
        :return:
        '''
        # 初始化所有层 w,b
        nabla_b = [np.zeros(bias.shape) for bias in self.biases]
        nabla_w = [np.zeros(weight.shape) for weight in self.weights]
        # error = (sigmoid(w*x+b) - y) * sigmoidDri(w*x+b)
    
        error = (self._activations[-1] - y) * \
            self._activate_func[-1](self._zs[-1])
    
        nabla_b[-1] = error
        nabla_w[-1] = error.dot(self._activations[-2].transpose())
        # 后向计算每层的传播误差
        for l in range(1,self.num_layers - 1)[::-1]:
            # print(l)
            # print(np.shape(self.weights[l + 1].transpose()))
            # print(np.shape(error))
            # print(np.shape(self._activate_func[-1](self._zs[l])))
            # print(np.shape(self._activations[l - 1].transpose()))
            # exit()
            error = np.multiply(
                self.weights[l + 1].transpose().dot(error), self._activate_func[-1](self._zs[l]))
            nabla_b[l] = error
            nabla_w[l] = error.dot(self._activations[l - 1].transpose())
    
    • 梯度下降算法 images
      目前,主要包括以下几种算法: 批梯度下降算法:利用全集样本选取梯度的方向和步长,能不断收敛,能逼近最优解 随机梯度下降算法:SGD或者On-line GD,选取少数的样本代替全局,不一定总沿着最陡的方向下降,在数据量大时,处理速度较快,适合线上操作,但是很容易达到局部收敛,学习率是一个影响很重要的参数。 此外,还有mini-batch GD,将全局按照mini-batch-size的尺度进行切分,对每一个batch的数据集进行梯度下降算法求解。
      手写字体识别中,训练样本量为55000,验证集为5000,利用mini-batch方法为回合进行训练
    def fit(self, train_data, validation_data=None):
        '''
        训练 W, b
        :param train_data: 训练数据
        :param validation_data: 发展集数据
        :return:
        '''
    
        # for epoch in range(self.epochs):
        accuracy = 0.0
        for epoch in range(self.epochs):
            # 重新加载batch数据集样本
            train_data_images, train_data_labels = train_data.next_batch(
                self.mini_batch_size)
            # 格式化数据
            mini_batches = self.formData(train_data_images, train_data_labels)
    
            for sample in mini_batches:
                # 初始化每轮batch的 w,b
                nbala_b = [np.zeros(bias.shape) for bias in self.biases]
                nabla_w = [np.zeros(weight.shape) for weight in self.weights]
                # 每个样本数据及其标签
                x, y = sample
    
                # 前向传播
                self._forward_prop(x)
                # 后向误差传播,得出 w,b偏差量
                data_nabla_b, data_nabla_w = self._back_prob(x, y)
                nbala_b = [nb + dnb for nb,
                           dnb in zip(nbala_b, data_nabla_b)]
                nabla_w = [nw + dnw for nw,
                           dnw in zip(nabla_w, data_nabla_w)]
            # 更新 W,b
            self.weights = [
                w - (self.eta / self.mini_batch_size) * dw for w, dw in zip(self.weights, nabla_w)]
            self.biases = [
                b - (self.eta / self.mini_batch_size) * db for b, db in zip(self.biases, nbala_b)]
            # 对发展集测试训练结果
            if validation_data:
                accuracy = self.validate(validation_data) * 100.0
                print('Epoch {0}, accuracy {1} %.'.format(
                    epoch + 1, accuracy))
            else:
                print('Prcoessed epoch{0}.'.format(epoch))
    

    2. MNIST字体识别github

    • 讲述完BP神经网络的基本原理后,借助MNIST数据集进行神经网络的实战。

    • 数据处理官网
      下载训练集和测试集数据,并进行相应的格式化处理,此处采取tensorflow的处理思路,会自动分析本地目录下是否已经下载四个数据集,否则重新下载解压,并格式化可以传入网络输入层的形式。一下为load.py文件内容

    from __future__ import absolute_import, absolute_import, print_function
    import os
    import gzip
    import numpy
    from six.moves import urllib, xrange
    # the datafile url
    SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'
    def _isDownload(file_name, work_dir):
        """
        download the data file if they don't exist
        return file_path
        """
        if not os.path.exists(work_dir):
            os.mkdir(work_dir)
        file_path = os.path.join(work_dir, file_name)
        if not os.path.exists(file_path):
            # download file
            file_path, _ = urllib.request.urlretrieve(
                SOURCE_URL + file_name, file_path)
            state_info = os.stat(file_path)
            print('Successfully downloaded!!', file_name,
                  state_info.st_size, 'bytes.')
        return file_path
    
    def _read32(bytestream):
    
        dt = numpy.dtype(numpy.uint32).newbyteorder('>')
        return numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
    
    def extractImages(file_name):
        """
        Extract the images into a 4D unit8 numpy.array
        like [index,y,x,depty]
        """
        print('Extracting ', file_name)
        with gzip.open(file_name) as bytestream:
            magic = _read32(bytestream)
            if magic != 2051:
                raise ValueError(
                    'Invalid magic number %d in MNIST image file:%s' %
                    (magic, file_name)
                )
            num_images = _read32(bytestream)
            rows = _read32(bytestream)
            cols = _read32(bytestream)
            buf = bytestream.read(rows * cols * num_images)
            data = numpy.frombuffer(buf, dtype=numpy.uint8)
            data = data.reshape(num_images, rows, cols, 1)
    
            return data
    
    def extractLabels(file_name, one_hot=False):
        """
        ===> 1D unit8 numpy.array [index]
        """
        print('Extracting ', file_name)
        with gzip.open(file_name) as bytestream:
            magic = _read32(bytestream)
            if magic != 2049:
                raise ValueError(
                    'Invalid magic number %d in MNIST image file:%s' %
                    (magic, file_name)
                )
            num_items = _read32(bytestream)
            buf = bytestream.read(num_items)
            labels = numpy.frombuffer(buf, dtype=numpy.uint8)
            if one_hot:
                return denseToOneHot(labels)
            return labels
    
    def denseToOneHot(labels_dense, number_classes=10):
        """
        class lables ==> one hot vectors
        """
        number_labels = labels_dense.shape[0]
        index_offset = numpy.arange(number_labels) * number_classes
        labels_ont_hot = numpy.zeros((number_labels, number_classes))
        labels_ont_hot.flat[index_offset + labels_dense.ravel()] = 1
        return labels_ont_hot
    
    class DataSet(object):
    
        def __init__(self, images, labels, fake_data=False):
            if fake_data:
                self._num_examples = 10000
            else:
                assert images.shape[0] == labels.shape[0], (
                    'images.shape:%s labels.shape:%s' % (
                        images.shape, labels.shape)
                )
                self._num_examples = images.shape[0]
                # [num examples,rows ,cols,depth] ====> [num examples,rows * cols] assuming depth=1
                assert images.shape[3] == 1
                images = images.reshape(images.shape[0],
                                        images.shape[1] * images.shape[2]
                                        )
                # [0,255] ===> [0.0,1.0]
                images = images.astype(numpy.float32)
                images = numpy.multiply(images, 1.0 / 255.0)
            self._images = images
            self._labels = labels
            self._epochs_completed = 0
            self._index_in_epoch = 0
    
            # __build_in__ fget and fset
        @property
        def images(self):
            return self._images
    
        @property
        def labels(self):
            return self._labels
    
        @property
        def num_examples(self):
            return self._num_examples
    
        @property
        def epoch_completed(self):
            return self._epochs_completed
    
        def next_batch(self, batch_size, fake_data=False):
            """
            return next "batch_size" examples from data
            """
            if fake_data:
                # 28 * 28 =784
                fake_image = [1.0 for _ in xrange(784)]
                fake_label = 0
                return[fake_image for _ in xrange(batch_size)], [fake_label for _ in xrange(batch_size)]
            start = self._index_in_epoch
            self._index_in_epoch += batch_size
            # if not over size
            if self._index_in_epoch > self._num_examples:
                self._epochs_completed += 1
    
                perm = numpy.arange(self._num_examples)
                # shuffle the data
                numpy.random.shuffle(perm)
                self._images = self._images[perm]
                self._labels = self._labels[perm]
    
                # start next epoch
                start = 0
                self._index_in_epoch = batch_size
                assert batch_size <= self._num_examples
            end = self._index_in_epoch
            return self._images[start:end], self._labels[start:end]
    
    def readDataSets(train_dir, fake_data=False, one_hot=False):
        class DataSets(object):
            pass
        data_sets = DataSets()
        if fake_data:
            data_sets.train = DataSet([], [], fake_data=True)
            data_sets.validation = DataSet([], [], fake_data=True)
            data_sets.test = DataSet([], [], fake_data=True)
            return data_sets
    
        TRAIN_IMAGES = 'train-images-idx3-ubyte.gz'
        TRAIN_LABELS = 'train-labels-idx1-ubyte.gz'
        TEST_IMAGES = 't10k-images-idx3-ubyte.gz'
        TEST_LABELS = 't10k-labels-idx1-ubyte.gz'
        VALIDVATION_SIZE = 5000
    
        local_file = _isDownload(TRAIN_IMAGES, train_dir)
        train_images = extractImages(local_file)
    
        local_file = _isDownload(TRAIN_LABELS, train_dir)
        train_labels = extractLabels(local_file, one_hot=one_hot)
    
        local_file = _isDownload(TEST_IMAGES, train_dir)
        test_images = extractImages(local_file)
    
        local_file = _isDownload(TEST_LABELS, train_dir)
        test_labels = extractLabels(local_file, one_hot=one_hot)
    
        validation_images = train_images[:VALIDVATION_SIZE]
        validation_labels = train_labels[:VALIDVATION_SIZE]
    
        train_images = train_images[VALIDVATION_SIZE:]
        train_labels = train_labels[VALIDVATION_SIZE:]
    
        data_sets.train = DataSet(train_images, train_labels)
        data_sets.validation = DataSet(validation_images, validation_labels)
        data_sets.test = DataSet(test_images, test_labels)
        return data_sets
    
    • 构建网络模型
      这里构建了一种较为灵活的网络模型,隐藏层的数量和单元数可以在参数 sizes 中随便设置,基本的函数意义在上述BP原理已经做了简单的解释和对应。一下为NeutralNetwork.py内容,此处定义了一个类

    [图片上传失败...(image-41e379-1534136488945)]

    import time
    from activate import *
    import load_data as input_data
    
    class NeuralNetwork(object):
        """
        three layers(one hidden layer) NN model with Cost_func = 0.5*(z-y) ** 2
        Using mini-batch Gradient
        """
        def __init__(self, in_units, hidden_units, out_units):
            """Returns a new 3-layer neural network with the specified layer sizes."""
            # Hyper parameters
            self.input_size = in_units
            self.output_size = out_units
            self.hidden_size = hidden_units
            self.activate_func = [sigmoid, sigmoidDerivative]
            # Learning parameters
            self.rate = 6.0
            # Weight parameters, randomly initialized
            self.W1 = np.random.uniform(-0.5, 0.5,
                                        (self.input_size, self.hidden_size))
            self.W2 = np.random.uniform(-0.5, 0.5,
                                        (self.hidden_size, self.output_size))
        def configure(self, rate=None):
            """Change the learning parameters of the network."""
            self.rate = self.rate if rate is None else rate
    
        def init_weights(self):
            """Initialize weights using Nguyen-Widrow."""
            self.W1 = np.random.uniform(-0.5, 0.5,
                                        (self.input_size, self.hidden_size))
            self.W2 = np.random.uniform(-0.5, 0.5,
                                        (self.hidden_size, self.output_size))
            # Initialize the hidden layer weights
            beta = 0.7 * (self.hidden_size ** (1.0 / self.input_size))
            for n in range(self.hidden_size):
                norm_val = np.linalg.norm(self.W1[:, n])
                self.W1[:, n] = np.multiply(self.W1[:, n], beta / norm_val)
    
            # Initialize the output layer weights
            beta = 0.7 * (self.output_size ** (1.0 / self.hidden_size))
            for n in range(self.output_size):
                norm_val = np.linalg.norm(self.W2[:, n])
                self.W2[:, n] = np.multiply(self.W2[:, n], beta / norm_val)
    
        def forward(self, sample):
            """Forward propagation through the network.
            sample: ndarray of shape (n, input_size), where n is number of samples
            """
            self.Z2 = np.dot(sample.T, self.W1).T
            self.A2 = self.activate_func[0](self.Z2)
            self.Z3 = np.dot(self.A2.T, self.W2).T
            self.y_hat = self.activate_func[0](self.Z3)
            return self.y_hat
    
        def cost(self, estimate, target):
            """Sum Squared Error cost function.
            estimate: ndarray of shape (output_size,n), where n is number of samples
            target  : ndarray of shape (output_size,n)
            """
            return np.mean(np.mean((target - estimate) ** 2, axis=0))
    
        def cost_prime(self, sample, target, estimate):
            """Gradient descent derivative.
            sample  : ndarray of shape (n, input_size), where n is number of samples
            target  : ndarray of shape (n, output_size)
            estimate: ndarray of shape (n, output_size)
            """
            total = len(sample)
    
            delta3 = np.multiply(-(target - estimate),
                                 self.activate_func[-1](self.Z3))
            dW2 = np.multiply(np.dot(self.A2, delta3.T), 2 / total)
    
            delta2 = np.dot(self.W2, delta3) * self.activate_func[-1](self.Z2)
            dW1 = np.multiply(np.dot(sample, delta2.T), 2 / total)
    
            return dW1, dW2
    
        def evaluate(self, sample, target):
            """Evaluate network performace using given data."""
            results = self.forward(sample.T)
            pairs = [(np.argmax(x), np.argmax(y))
                     for x, y in zip(results.T, target.T)]
            correct = sum(int(x == y) for x, y in pairs)
            return correct
    
        def backprop(self, images, labels):
            """Update weights using batch backpropagation."""
            size = len(labels)
            dW1s = []
            dW2s = []
            for i in range(size):
                label = labels[i]
                image = images[i]
    
                estimate = self.forward(image)
                dW1, dW2 = self.cost_prime(image, label, estimate)
    
                dW1s.append(dW1)
                dW2s.append(dW2)
    
            self.W1 = self.W1 - (self.rate / size) * sum(dW1s)
            self.W2 = self.W2 - (self.rate / size) * sum(dW2s)
    
        def train(self, train_data, epochs, batch_size, test_set=None):
            """Train the neural network using given data and parameters."""
            if test_set is not None:
                size_test = len(test_set.labels)
            size = len(train_data.labels)
            print("num training data: {}".format(size))
    
            self.costs = []
            start = time.time()
            for r in range(epochs):
                batch_datas = []
                for i in range(10):
                    # 重新加载batch数据集样本
                    train_data_images, train_data_labels = train_data.next_batch(
                        batch_size)
                    # 格式化数据
                    mini_batches = self.formData(
                        train_data_images, train_data_labels)
                    batch_datas.append(mini_batches)
    
                for batch_data in batch_datas:
                    images, labels = [], []
                    for data in batch_data:
                        images.append(data[0])
                        labels.append(data[1])
                    self.backprop(images, labels)
    
                # target = train_data.labels
                # sample = train_data.images
                # estimate = self.forward(sample.T)
                # cost = self.cost(estimate, target.T)
                # self.costs.append(cost)
                # print("Epoch {} complete: cost {}".format(r, cost))
    
                if test_set is not None:
                    target = test_set.labels
                    sample = test_set.images
                    correct = self.evaluate(sample, target.T)
                    print("  {} / {}".format(correct, size_test))
            stop = time.time()
            elapsed = stop - start
            print("Time elapsed: {} sec".format(elapsed))
    
        def formData(self, data_images, data_labels):
            '''
            格式化数据
            :param data_images: 图片向量集
            :param data_labels: 图片标签集
            :return:格式化后数据
            '''
    
            data = zip([np.reshape(x, (784, 1)) for x in data_images],
                       [np.reshape(y, (10, 1)) for y in data_labels])
            return data
    
    • 训练 训练入口函数在main.py中
    import load as input_data
    # import MNIST
    import network
    def train():
      mnist = input_data.readDataSets('data', one_hot=True)
      train_data = mnist.train
      validation_data = mnist.validation
      nn = network.NN(sizes=[784,200,10], epochs=50000, mini_batch_size=10,learning_rate=0.3) nn.fit(train_data, validation_data=validation_data)
      nn.save()
      train()
    

    训练输出过程如下:

    $ python3 main.py
      Extracting data/train-images-idx3-ubyte.gz
      Extracting data/train-labels-idx1-ubyte.gz
      Extracting data/t10k-images-idx3-ubyte.gz
      Extracting data/t10k-labels-idx1-ubyte.gz
      Epoch 1, accuracy 16.900000000000002 %.
      Epoch 2, accuracy 16.900000000000002 %.
      Epoch 3, accuracy 17.02 %.
      Epoch 4, accuracy 16.939999999999998 %.
      Epoch 5, accuracy 17.02 %.
      Epoch 6, accuracy 17.0 %.
      Epoch 7, accuracy 16.98 %.
      Epoch 8, accuracy 17.0 %.
      Epoch 9, accuracy 17.06 %.
      Epoch 10, accuracy 17.080000000000002 %.
      Epoch 11, accuracy 17.06 %.
      Epoch 12, accuracy 17.1 %.
    

    相关文章

      网友评论

          本文标题:BP神经网络下MNIST字体识别

          本文链接:https://www.haomeiwen.com/subject/ipnybftx.html