机器学习实战项目3--使用keras实现MNIDT数据集手写数字

作者: strive鱼 | 来源:发表于2019-03-09 21:03 被阅读0次

机器学习实战项目3--使用keras实现MNIDT数据集手写数字
Keras.model.save() 引发 NotImpleme
卷积神经网络层可视化实战--Apple的学习笔记
使用Python解析MNIST数据集（IDX文件格式）
机器学习系列（二十一）——PCA降噪与人脸识别
100天搞定机器学习|day40-42 Tensorflow K
神经网络实现Mnist手写数字识别
2019-11-19
深度学习笔记：三维图片分类与三维卷积神经网络
机器学习系列（二十）——PCA在手写数字数据集的应用

这是机器学习训练营的第三个项目

Q1：神经网络中的常用激活函数及其导数
Q2:使用Keras 实现MNIDT数据集手写识别

answer1:

1.详见http://www.cnblogs.com/lliuye/p/9486500.html

(1) 什么是激活函数

激活函数

（2）常用的线性激活函数有哪些

sigmoid 函数，tanh函数，ReLU函数

sigmoid 函数

sigmoid 函数一般不常作为非线性激活函数，因为主要由以下缺点

1.造成梯度消失，我们知道一般在线性逻辑回归中，要不断的通过求导来变换权重，使得损失函数最小，但是由上图可以看出当z特别大或者特别小的时候，导数为零，这时候权重梯度很小，因此迭代速度慢，称之为梯度消失

下面是梯度消失的简单推导过程，个人觉得博主总结的非常好，清晰明了

tanh 函数

ReLu函数

Leaky ReLu 函数

Leaky ReLU函数解决了ReLU函数在输入为负的情况下产生的梯度消失问题。

（2）为什么在神经网络的时候一定要用非线性激活函数
下面仍用简单公式进行解释

answer2:
相关代码和注释如下：

import matplotlib.pyplot as plt
import numpy as np
plt.rcParams['figure.figsize']=(7,7)#改变头像的尺寸
"""
plt.rcParams[savefig.dpi']=300  用于设定像素
plt.rcParams['figure.dpi']=300 用于设定分辨率
设定figsize可以在不改变分辨率的条件下改变图像的尺寸
"""
import tensorflow as tf
from keras.datasets import mnist#包含训练图片和标签，测试图片和标签
from keras.models import Sequential#顺序模型
from keras.layers.core import Dense,Dropout,Activation#激活函数，dropout层，dense层
from keras.utils import np_utils#类似于 one_hot()  稀疏矩阵


## load training data
nb_classes=10#分为10个类别
(x_train,y_train),(x_test,y_test)=mnist.load_data()#加载数据
#print ('x_train original shape', x_train.shape)#(60000,28,28)
#print ('y_train original shape', y_train.shape)#(60000)


##let's look at some examples of the training data

for i in range(9):
    plt.subplot(3,3,i+1)#生成三行三列，最后一个是目前所在图形编号
    plt.imshow(x_train[i],cmap='gray',interpolation='none')
    plt.title('class{}'.format(y_train[i]))#有时候会提示主机脱机，解决办法就是将数据集下载放在同一目录下

"""
Format the data for training
Our neural-network is going to
take a single vector for each training
example, so we need to reshape the input so that 
each 28x28 image becomes a single 784 dimensional vector.
We'll also scale the inputs to be in the range [0-1] rather than [0-255]
下面的代码主要做了两那部分内容
试想以下三位数组，其实就是60000张28寸*28寸的照片叠加
我们第一步骤就是将维度简化，也就是说将2维的照片变成784个点，即降维
第二每个点本来是在[0,255]之间，我们需要把它进行归一化到[0,1]之间
"""
x_train=x_train.reshape(60000,784)
x_test=x_test.reshape(10000,784)
x_train=x_train.astype('float32')#变为浮点
x_test=x_test.astype('float32')
x_train/=255#赋值运算符号 等价于c=c/25
x_test/=255
"""
Modify the target matrices to be in the one-hot format, i.e.

0 -> [1, 0, 0, 0, 0, 0, 0, 0, 0]
1 -> [0, 1, 0, 0, 0, 0, 0, 0, 0]
2 -> [0, 0, 1, 0, 0, 0, 0, 0, 0]
etc.直到9
机器学习中也用one_hot()来变为系数矩阵
"""
y_train=np_utils.to_categorical(y_train,nb=classes)#其中nb_classes的作用就是限定变化后的维度，即列数
y_test=np_utils.to_categorical(y_test,nb_classes)

"""
Build the neural network¶
Build the neural-network. 
Here we'll do a simple 3 layer fully connected network
"""

model=Sequential()#实例化神经网络模型，快速开始序贯模型，是多个网络的线性堆叠

"""
关于该模型可以看官方文档
https://keras-cn.readthedocs.io/en/latest/getting_started/sequential_model/

关于 dropout函数的作用请看
https://blog.csdn.net/stdcoutzyx/article/details/49022443

关于softmax 激活函数请看
https://www.cnblogs.com/alexanderkun/p/8098781.html
"""

model.add(Dense(512,input_shape(784)))#该神经元共512层，input_dim等价于input_shape,传入一个元组，关键提示参数维784列
model.add(Activation('relu'))#制定激活函数
model.add(Dropout(0.2))#通过Dropout 提高泛化能力

"""
第二个神经元
"""
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))

"""
第三个神经元
"""
model.add(Dense(10))
model.add(Activation('softmax'))#利用该参数确保输出的值小于1，对比该值是0-9的概率，取最大的


"""
编译模型

"""
model.compile(loss='categorical_crossentropy',optimizer='adam')#前边的参数是损失函数，后面的是优化器



"""
下面进行模型的训练，关于
fit的参数问题，可以参考
https://blog.csdn.net/a1111h/article/details/82148497
"""
model.fit(X_train, Y_train,
          batch_size=128, nb_epoch=4,
          show_accuracy=True, verbose=1,
          validation_data=(X_test, Y_test))

"""
finally evaluate its performance
"""
score = model.evaluate(X_test, Y_test,show_accuracy=True, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
# The predict_classes function outputs the highest probability class
# according to the trained classifier for each input example.
predicted_classes = model.predict_classes(X_test)

# Check which items we got right / wrong
correct_indices = np.nonzero(predicted_classes == y_test)[0]
incorrect_indices = np.nonzero(predicted_classes != y_test)[0]#np.nonzere 为元组