前言
在本篇文章中,将会利用kears网络搭建一个resnet模型,并且实现手势识别的应用,搭建resnets网络基本分为以下两个步骤:
- 构建基本的resnets块;
- 将这些基本的resnets块连接起来构成深度神经网络。
首先,还是导入需要用到的python包,如下所示:
import numpy as np
import tensorflow as tf
from keras import layers
from keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from keras.models import Model, load_model
from keras.preprocessing import image
from keras.utils import layer_utils
from keras.utils.data_utils import get_file
from keras.applications.imagenet_utils import preprocess_input
import pydot
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
from keras.utils import plot_model
from resnets_utils import *
from keras.initializers import glorot_uniform
import scipy.misc
from matplotlib.pyplot import imshow
%matplotlib inline
import keras.backend as K
K.set_image_data_format('channels_last')
K.set_learning_phase(1)
深层神经网络的缺陷
深度神经网络可以实现非常复杂的功能,以数据的多元分类而言,浅层神经网络的预测精确度较低,而深度神经网络的可以学习很多复杂的特征。但是深度神经网络的一个大缺陷就是可能会导致梯度消失。也就是,在深度神经网络梯度下降过程中,反向传播的每一步都需要乘以权重参数,会导致梯度很快下降为0而训练速度变慢。神经网络层数与梯度范数的关系可以用下图表示:
搭建resnets网络
resnets与常规的神经网络相比,多了一条“跳跃连接(skip connection)”的通道,而这条通道允许反向传播能够快速的将深层网络参数传递给较浅层的网络中,具体如下图所示:
identity block
identity block是resnets网络的标准块,对应于激活函数的输入与激活函数的输出有相同的维度的情况,如下图所示:
本次练习中,"skip connection"会跳跃3个隐藏层,而不是2个隐藏层,具体如下图所示:
整个网络的组成部分有以下说明:
- 主路径上卷积层的说明
- 第一个CONV2D:卷积层,使用个1×1的卷积核(过滤器),步长为1(stride = (1,1)),填充方式是“valid”,随机种子设置为0.
- 第二个CONV2D:使用个f×f的卷积核(过滤器),步长为1(stride = (1,1)),填充方式是“same”,随机种子设置为0.
- 第三个CONV2D:使用个1×1的卷积核(过滤器),步长为1(stride = (1,1)),填充方式是“valid”,随机种子设置为0.
根据以上说明,构建一个identify block的代码如下所示:
def identity_block(X, f, filters, stage, block):
"""
参数:
X -- 输入张量,维度 (m, n_H_prev, n_W_prev, n_C_prev)
f -- 整数,第二个卷积层的维度
filters --python整型列表,定义了使用过滤器的个数
stage -- 整形数字,用于layer的编号,取决于网络层的位置
block -- 字符串/字符, 用于命名层的名称, 取决于其在网络中的位置
返回值:
X -- 输出identity block, 张量的形状是 (n_H, n_W, n_C)
"""
# 一些名称的定义
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
# 得到每个卷积层过滤器的数量
F1, F2, F3 = filters
# 保存输入值,并将其直接添加到主路径的尾部
X_shortcut = X
# 主路径的第一个组成部分
X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
X = Activation('relu')(X)
# 主路径的第二个组成部分
X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', name = conv_name_base + '2b', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name = bn_name_base + '2b')(X)
X = Activation('relu')(X)
# 主路径的第三个组成部分
X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2c', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name = bn_name_base + '2c')(X)
# 将跳跃路径上的值添加到主路径上,相加之后作为激活函数的输入
X = layers.add([X, X_shortcut])
X = Activation('relu')(X)
The convolutional block
identity block用以处理输入和输出维数相同的网络,而 convolutional block用以处理输入和输出维数不匹配的网络,其网络结构如下所示:
如上图所示,跳跃路径上存在卷积层的原因是能够调节输入的维度,以确保能够将输入和主路径上的输出相加。
与dientity block相比,其网络的构成有如下区别:
-
主路径上的卷积层:
使用个1×1的卷积核(过滤器),步长为s(stride = (s,s)),填充方式是“valid”,随机种子设置为0. -
shortcut路径上的卷积层
使用个1×1的卷积核(过滤器),步长为s(stride = (s,s)),填充方式是“valid”,随机种子设置为0.
综上所述,convolution层的实现方式如下所示:
def convolutional_block(X, f, filters, stage, block, s = 2):
"""
convolutional block的实现
参数:
X --输入张量,维度 (m, n_H_prev, n_W_prev, n_C_prev)
f -- 整数,主路径上中间卷积层的维度
filters --python整型列表,定义了使用过滤器的个数
stage -- 整形数字,用于layer的编号,取决于网络层的位置
block -- 字符串/字符, 用于命名层的名称, 取决于其在网络中的位置
s -- 整数,表示卷积层的步长
返回值:
X -- 输出,张量维度是 (n_H, n_W, n_C)
"""
# 定义一些基本的名称
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
# 得到过滤器的数量
F1, F2, F3 = filters
# 保存输入值
X_shortcut = X
##### 主路径#####
#主路径上的第一个组成
X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', padding='valid', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
X = Activation('relu')(X)
# 主路径上的第二个组成部分
X = Conv2D(F2, (f, f), strides = (1, 1), name = conv_name_base + '2b',padding='same', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
X = Activation('relu')(X)
# 主路径上的第三个组成部分
X = Conv2D(F3, (1, 1), strides = (1, 1), name = conv_name_base + '2c',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X)
##### short cut路径 ####
X_shortcut = Conv2D(F3, (1, 1), strides = (s, s), name = conv_name_base + '1',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
X_shortcut = BatchNormalization(axis = 3, name = bn_name_base + '1')(X_shortcut)
# 将shortcut路径上的输出值和主路径上的输出值相加,并传递给激活函数
X = layers.add([X, X_shortcut])
X = Activation('relu')(X)
return X
用resnet网络实现的手势识别应用
在之前的文章中,通过卷积神经网络实现过了手势识别的应用了,但是由于网络层数较浅,所以分类精确度不高,本次练习中,将通过keras构建一个深度卷积网络,实现手势识别的应用,整个网络的结构如下所示:
如上图所示的模型,利用keras的实现代码如下所示:
def ResNet50(input_shape = (64, 64, 3), classes = 6):
"""
实现一个resnet50网络的结构如下所示:
CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3
-> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER
参数:
input_shape -- 图像数据集的输入尺寸
classes -- 整数,分类的类别数
返回值:
model -- keras中的model()实例
"""
# 定义一个输入张量
X_input = Input(input_shape)
# 零值填充的大小
X = ZeroPadding2D((3, 3))(X_input)
# Stage 1的构成
X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3, 3), strides=(2, 2))(X)
# stage 2的构成
X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1)
X = identity_block(X, 3, [64, 64, 256], stage=2, block='b')
X = identity_block(X, 3, [64, 64, 256], stage=2, block='c')
# Stage 3的构成
# The convolutional block 每个卷积层过滤器的大小[128,128,512], "f" is 3, "s" is 2 and the block is "a".
# 3个 identity blocks 每个卷积层使用过滤器的集合 [128,128,512], "f" is 3 and the blocks are "b", "c" and "d".
X = convolutional_block(X, f = 3, filters=[128,128,512], stage = 3, block='a', s = 2)
X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='b')
X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='c')
X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='d')
# stage 4的集合
# The convolutional block 每个卷积层使用的过滤器大小[256, 256, 1024], "f" is 3, "s" is 2 and the block is "a".
# 5个 identity blocks每个卷积层使用的过滤器集合[256, 256, 1024], "f" is 3 and the blocks are "b", "c", "d", "e" and "f".
X = convolutional_block(X, f = 3, filters=[256, 256, 1024], block='a', stage=4, s = 2)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='b', stage=4)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='c', stage=4)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='d', stage=4)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='e', stage=4)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='f', stage=4)
# Stage 5 结构
# convolutional block 3个卷积层使用的过滤器数量的大小 [512, 512, 2048], "f" is 3, "s" is 2 and the block is "a".
# 2个identity blocks 3个卷积层使用的过滤器的大小是 [256, 256, 2048], "f" is 3 and the blocks are "b" and "c".
X = convolutional_block(X, f = 3, filters=[512, 512, 2048], stage=5, block='a', s = 2)
X = identity_block(X, f = 3, filters=[256, 256, 2048], stage=5, block='b')
X = identity_block(X, f = 3, filters=[256, 256, 2048], stage=5, block='c')
# 使用平均池化,窗口大小是2×2
X = AveragePooling2D(pool_size=(2,2))(X)
# 输出层
X = Flatten()(X)
X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = glorot_uniform(seed=0))(X)
# 创建模型
model = Model(inputs = X_input, outputs = X, name='ResNet50')
return model
根据keras搭建神经网络的步骤,运行以上代码之后,编译模型->拟合模型,多次迭代之后,训练集上的精度如下所示:
#构建模型实例
model = ResNet50(input_shape = (64, 64, 3), classes = 6)
#编译模型
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# 加载数据集
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()
#归一化处理
X_train = X_train_orig/255.
X_test = X_test_orig/255.
# 转换为one_hot编码
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T
# 拟合模型
model.fit(X_train, Y_train, epochs = 20, batch_size = 32)
在测试集上评估模型,有如下输出:
最后,打印出整个网路的结构如下所示:
网友评论