https://zhuanlan.zhihu.com/p/25053311
YOLO是基于深度学习的端到端的实时目标检测系统。与大部分目标检测与识别方法(比如Fast R-CNN)将目标识别任务分类目标区域预测和类别预测等多个流程不同,YOLO将目标区域预测和目标类别预测整合于单个神经网络模型中,实现在准确率较高的情况下快速目标检测与识别,更加适合现场应用环境。详情请参见:YOLO:实时快速目标检测;YOLO升级版:YOLOv2和YOLO9000解析。本文将对YOLO的tensorflow实现代码进行详解。本文使用的YOLO源码来源于hizhangp/yolo_tensorflow。
本文结构如下:一,YOLO代码概况;二,train解析;三,test概括;四,总结
1 YOLO代码概况
源代码文件构成如图1-1所示。train.py为训练代码,test.py为测试代码,其它文件夹内的代码为设定参数,建立网络,读取数据等辅助代码。

图1-1 YOLO源代码文件夹
2 train解析
从main()方法,首先读取参数;其次建立YOLONet;然后读取训练数据;最后进行训练。
2.1 建立YOLONet
YOLONet的建立是通过 yolo文件夹中的yolo_net.py文件的代码实现了。yolo_net.py定义了YOLONet类,该类包含了网络初始化(__init__()),建立网络(build_networks())和loss函数(loss_layer())等方法。
网络的所有初始化参数包含于__init__()方法之中。
def __init__(self, phase):
self.weights_file = cfg.WEIGHTS_FILE#权重文件
self.classes = cfg.CLASSES#类别
self.num_class = len(self.classes)#类别数量,值为20
self.image_size = cfg.IMAGE_SIZE#图像尺寸,值为448
self.cell_size = cfg.CELL_SIZE#cell尺寸,值为7
self.boxes_per_cell = cfg.BOXES_PER_CELL#每个grid cell负责的boxes,默认为2
self.output_size = (self.cell_size * self.cell_size) * \
(self.num_class + self.boxes_per_cell * 5)#输出尺寸
self.scale = 1.0 * self.image_size / self.cell_size
self.boundary1 = self.cell_size * self.cell_size * self.num_class#7×7×20
self.boundary2 = self.boundary1 + self.cell_size * \
self.cell_size * self.boxes_per_cell#7×7×20+7×7×2
self.object_scale = cfg.OBJECT_SCALE#值为1
self.noobject_scale = cfg.NOOBJECT_SCALE#值为1
self.class_scale = cfg.CLASS_SCALE#值为2.0
self.coord_scale = cfg.COORD_SCALE#值为5.0
self.learning_rate = cfg.LEARNING_RATE#学习速率LEARNING_RATE = 0.0001
self.batch_size = cfg.BATCH_SIZE#BATCH_SIZE = 45
self.alpha = cfg.ALPHA#ALPHA = 0.1
self.disp_console = cfg.DISP_CONSOLE#DISP_CONSOLE = False
self.phase = phase#train or test
self.collection = []#用于储存网络参数
self.offset = np.transpose(np.reshape(np.array(
[np.arange(self.cell_size)] * self.cell_size * self.boxes_per_cell),
(self.boxes_per_cell, self.cell_size, self.cell_size)), (1, 2, 0))#偏置
self.build_networks()
网络建立是通过build_networks()方法实现的,网络由卷积层-pooling层和全连接层组成,详细结构请参见源代码和YOLO:实时快速目标检测。网络接受输入维度为([None, 448, 448, 3]),输出维度为([None,1470])。
loss函数代码的关键,loss函数定义为:

(参加:YOLO:实时快速目标检测)
loss函数是通过loss_layer()实现,代码注释对各个变量的shape进行了注释,结果如下。
计算iou
def calc_iou(self, boxes1, boxes2):
"""calculate ious
Args:
boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ====> (x_center, y_center, w, h)
boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)
Return:
iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
"""
boxes1 = tf.pack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,
boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0])
boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])
boxes2 = tf.pack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,
boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,
boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,
boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2])
boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])
# calculate the left up point & right down point
lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])
rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])
# intersection
intersection = tf.maximum(0.0, rd - lu)
inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]
# calculate the boxs1 square and boxs2 square
square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \
(boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])
square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \
(boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])
union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)
return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)
#loss函数
#idx=33,predicts为fc_32,labels shape为(45, 7, 7, 25)
#self.loss = self.loss_layer(33, self.fc_32, self.labels)
def loss_layer(self, idx, predicts, labels):
#将网络输出分离为类别和定位以及box大小,输出维度为7*7*20+7*7*2+7*7*2*4=1470
#类别,shape为(45, 7, 7, 20)
predict_classes = tf.reshape(predicts[:, :self.boundary1],
[self.batch_size, self.cell_size, self.cell_size, self.num_class])
#定位,shape为(45, 7, 7, 2)
predict_scales = tf.reshape(predicts[:, self.boundary1:self.boundary2],
[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell])
##box大小,长宽等 shape为(45, 7, 7, 2, 4)
predict_boxes = tf.reshape(predicts[:, self.boundary2:],
[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell, 4])
#label的类别结果,shape为(45, 7, 7, 1)
response = tf.reshape(labels[:, :, :, 0],
[self.batch_size, self.cell_size, self.cell_size, 1])
#label的定位结果,shape为(45, 7, 7, 1, 4)
boxes = tf.reshape(labels[:, :, :, 1:5],
[self.batch_size, self.cell_size, self.cell_size, 1, 4])
#label的大小结果,shapewei (45, 7, 7, 2, 4)
boxes = tf.tile(boxes, [1, 1, 1, self.boxes_per_cell, 1]) / self.image_size
#shape 为(45, 7, 7, 20)
classes = labels[:, :, :, 5:]
#offset shape为(7, 7, 2)
offset = tf.constant(self.offset, dtype=tf.float32)
#shape为 (1,7, 7, 2)
offset = tf.reshape(offset,
[1, self.cell_size, self.cell_size, self.boxes_per_cell])
#shape为(45, 7, 7, 2)
offset = tf.tile(offset, [self.batch_size, 1, 1, 1])
#shape为(4, 45, 7, 7, 2)
predict_boxes_tran = tf.pack([(predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,
(predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,
tf.square(predict_boxes[:, :, :, :, 2]),
tf.square(predict_boxes[:, :, :, :, 3])])
#shape为(45, 7, 7, 2, 4)
predict_boxes_tran = tf.transpose(predict_boxes_tran, [1, 2, 3, 4, 0])
#shape为(45, 7, 7, 2)
iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)
# calculate I tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
#shape为 (45, 7, 7, 1)
object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)
#shape为(45, 7, 7, 2)
object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response
# mask = tf.tile(response, [1, 1, 1, self.boxes_per_cell])
# calculate no_I tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
#shape为(45, 7, 7, 2)
noobject_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask
#shape为(4, 45, 7, 7, 2)
boxes_tran = tf.pack([boxes[:, :, :, :, 0] * self.cell_size - offset,
boxes[:, :, :, :, 1] * self.cell_size - tf.transpose(offset, (0, 2, 1, 3)),
tf.sqrt(boxes[:, :, :, :, 2]),
tf.sqrt(boxes[:, :, :, :, 3])])
(45, 7, 7, 2, 4)
boxes_tran = tf.transpose(boxes_tran, [1, 2, 3, 4, 0])
# class_loss
class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(response * (predict_classes - classes)),
reduction_indices=[1, 2, 3]), name='class_loss') * self.class_scale
# object_loss
object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_mask * (predict_scales - iou_predict_truth)),
reduction_indices=[1, 2, 3]), name='object_loss') * self.object_scale
# noobject_loss
noobject_loss = tf.reduce_mean(tf.reduce_sum(tf.square(noobject_mask * predict_scales),
reduction_indices=[1, 2, 3]), name='noobject_loss') * self.noobject_scale
# coord_loss
#shape 为 (45, 7, 7, 2, 1)
coord_mask = tf.expand_dims(object_mask, 4)
#shape为(45, 7, 7, 2, 4)
boxes_delta = coord_mask * (predict_boxes - boxes_tran)
coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta),
reduction_indices=[1, 2, 3, 4]), name='coord_loss') * self.coord_scale
tf.summary.scalar(self.phase + '/class_loss', class_loss)
tf.summary.scalar(self.phase + '/object_loss', object_loss)
tf.summary.scalar(self.phase + '/noobject_loss', noobject_loss)
tf.summary.scalar(self.phase + '/coord_loss', coord_loss)
tf.summary.histogram(self.phase + '/boxes_delta_x', boxes_delta[:, :, :, :, 0])
tf.summary.histogram(self.phase + '/boxes_delta_y', boxes_delta[:, :, :, :, 1])
tf.summary.histogram(self.phase + '/boxes_delta_w', boxes_delta[:, :, :, :, 2])
tf.summary.histogram(self.phase + '/boxes_delta_h', boxes_delta[:, :, :, :, 3])
tf.summary.histogram(self.phase + '/iou', iou_predict_truth)
return class_loss + object_loss + noobject_loss + coord_loss
2.2 读取数据
通过utils文件夹中的pascal_voc.py文件读取数据。
2.3 训练
模型训练包含于train()方法之中。训练部分只需看懂了初始化参数,整个结构就很清晰了。值得注意的地方是在训练过程中,对变量采用了数平均数(exponential moving average (EMA))来提高训练性能,详情见代码注释。同时,运行train.py时,建议将batch_size改小一些(原参数batch size为45,第一次运行没注意,死机了)。
计算iou
def calc_iou(self, boxes1, boxes2):
"""calculate ious
Args:
boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ====> (x_center, y_center, w, h)
boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)
Return:
iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
"""
boxes1 = tf.pack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,
boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0])
boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])
boxes2 = tf.pack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,
boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,
boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,
boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2])
boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])
# calculate the left up point & right down point
lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])
rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])
# intersection
intersection = tf.maximum(0.0, rd - lu)
inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]
# calculate the boxs1 square and boxs2 square
square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \
(boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])
square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \
(boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])
union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)
return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)
#loss函数
#idx=33,predicts为fc_32,labels shape为(45, 7, 7, 25)
#self.loss = self.loss_layer(33, self.fc_32, self.labels)
def loss_layer(self, idx, predicts, labels):
#将网络输出分离为类别和定位以及box大小,输出维度为7*7*20+7*7*2+7*7*2*4=1470
#类别,shape为(45, 7, 7, 20)
predict_classes = tf.reshape(predicts[:, :self.boundary1],
[self.batch_size, self.cell_size, self.cell_size, self.num_class])
#定位,shape为(45, 7, 7, 2)
predict_scales = tf.reshape(predicts[:, self.boundary1:self.boundary2],
[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell])
##box大小,长宽等 shape为(45, 7, 7, 2, 4)
predict_boxes = tf.reshape(predicts[:, self.boundary2:],
[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell, 4])
#label的类别结果,shape为(45, 7, 7, 1)
response = tf.reshape(labels[:, :, :, 0],
[self.batch_size, self.cell_size, self.cell_size, 1])
#label的定位结果,shape为(45, 7, 7, 1, 4)
boxes = tf.reshape(labels[:, :, :, 1:5],
[self.batch_size, self.cell_size, self.cell_size, 1, 4])
#label的大小结果,shapewei (45, 7, 7, 2, 4)
boxes = tf.tile(boxes, [1, 1, 1, self.boxes_per_cell, 1]) / self.image_size
#shape 为(45, 7, 7, 20)
classes = labels[:, :, :, 5:]
#offset shape为(7, 7, 2)
offset = tf.constant(self.offset, dtype=tf.float32)
#shape为 (1,7, 7, 2)
offset = tf.reshape(offset,
[1, self.cell_size, self.cell_size, self.boxes_per_cell])
#shape为(45, 7, 7, 2)
offset = tf.tile(offset, [self.batch_size, 1, 1, 1])
#shape为(4, 45, 7, 7, 2)
predict_boxes_tran = tf.pack([(predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,
(predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,
tf.square(predict_boxes[:, :, :, :, 2]),
tf.square(predict_boxes[:, :, :, :, 3])])
#shape为(45, 7, 7, 2, 4)
predict_boxes_tran = tf.transpose(predict_boxes_tran, [1, 2, 3, 4, 0])
#shape为(45, 7, 7, 2)
iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)
# calculate I tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
#shape为 (45, 7, 7, 1)
object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)
#shape为(45, 7, 7, 2)
object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response
# mask = tf.tile(response, [1, 1, 1, self.boxes_per_cell])
# calculate no_I tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
#shape为(45, 7, 7, 2)
noobject_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask
#shape为(4, 45, 7, 7, 2)
boxes_tran = tf.pack([boxes[:, :, :, :, 0] * self.cell_size - offset,
boxes[:, :, :, :, 1] * self.cell_size - tf.transpose(offset, (0, 2, 1, 3)),
tf.sqrt(boxes[:, :, :, :, 2]),
tf.sqrt(boxes[:, :, :, :, 3])])
(45, 7, 7, 2, 4)
boxes_tran = tf.transpose(boxes_tran, [1, 2, 3, 4, 0])
# class_loss
class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(response * (predict_classes - classes)),
reduction_indices=[1, 2, 3]), name='class_loss') * self.class_scale
# object_loss
object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_mask * (predict_scales - iou_predict_truth)),
reduction_indices=[1, 2, 3]), name='object_loss') * self.object_scale
# noobject_loss
noobject_loss = tf.reduce_mean(tf.reduce_sum(tf.square(noobject_mask * predict_scales),
reduction_indices=[1, 2, 3]), name='noobject_loss') * self.noobject_scale
# coord_loss
#shape 为 (45, 7, 7, 2, 1)
coord_mask = tf.expand_dims(object_mask, 4)
#shape为(45, 7, 7, 2, 4)
boxes_delta = coord_mask * (predict_boxes - boxes_tran)
coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta),
reduction_indices=[1, 2, 3, 4]), name='coord_loss') * self.coord_scale
tf.summary.scalar(self.phase + '/class_loss', class_loss)
tf.summary.scalar(self.phase + '/object_loss', object_loss)
tf.summary.scalar(self.phase + '/noobject_loss', noobject_loss)
tf.summary.scalar(self.phase + '/coord_loss', coord_loss)
tf.summary.histogram(self.phase + '/boxes_delta_x', boxes_delta[:, :, :, :, 0])
tf.summary.histogram(self.phase + '/boxes_delta_y', boxes_delta[:, :, :, :, 1])
tf.summary.histogram(self.phase + '/boxes_delta_w', boxes_delta[:, :, :, :, 2])
tf.summary.histogram(self.phase + '/boxes_delta_h', boxes_delta[:, :, :, :, 3])
tf.summary.histogram(self.phase + '/iou', iou_predict_truth)
return class_loss + object_loss + noobject_loss + coord_loss
3 test概括
test.py完成读取训练好的网络权重,检测目标,并画出目标所在位置。代码和训练部分类似,略过。要运行test,首先需要下载文章原作者训练好的模型YOLO_small(貌似需要翻墙)。其次,源代码中有一处小bug,直接运行会报错。
net_output=self.sess.run(self.net.fc_32,feed_dict={self.net.images:inputs})
需要改为
net_output=self.sess.run(self.net.fc_32,feed_dict={self.net.x:inputs})
运行结果如图3-1所示,可以看出YOLO能成功识别人和狗,却识别不了马,作者在后续的文章中对YOLO进行了,使之能识别更多的种类,详见YOLO升级版:YOLOv2和YOLO9000解析。

图3-1 YOLO检测结果一
或者其它图片,如图3-2所示:

图3-2 YOLO检测结果二
4 总结
YOLO是基于深度学习的端到端的实时目标检测系统,主要的特点是速度非常快,同时还有继续提升精确度的潜力。本文对YOLO的tensorflow实现代码进行了详解,该代码在理解了文章后就很简单。其中涉及到的tensorflow知识有以下几点:
一,tf.get_variable 和tf.Variable的差异。差异点点是,前者拥有一个变量检查机制,会检测已经存在的变量是否设置为共享变量,如果已经存在的变量没有设置为共享变量,TensorFlow 运行到第二个拥有相同名字的变量的时候,就会报错。
二,学习速率延迟的实现。
tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None) #decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)
三, 采用指数平均数(exponential moving average (EMA))提高梯度下降(exponential moving average (EMA))训练方法的效果。
self.ema = tf.train.ExponentialMovingAverage(decay=0.9999)
self.averages_op = self.ema.apply(tf.trainable_variables())
with tf.control_dependencies([self.optimizer]):
self.train_op = tf.group(self.averages_op)
四,tf.pack()函数。
tf.pack(values, name=textquotesingle {}packtextquotesingle {})
#该函数的功能等同于np.asarray
tf.pack([x, y, z]) = np.asarray([x, y, z])
五,tf.tile()函数。该函数在某一维度上进行复制。
tf.tile(input,multiples,name=None)
网友评论