【转载】YOLO源码解析

作者: dopami | 来源:发表于2018-01-16 17:39 被阅读343次

【转载】YOLO源码解析
yolo3解析
12 java.lang.Short
ThreadLocal原理解析（2）：ThreadLocalMa
yolo系列之yolo v3
计算keras-yolo v3 结果的mAP值
【转载】yolo系列之yolo v3【深度解析】
11 java.lang.Long
MJRefresh源码解析(转载)
CocoaAsyncSocket源码解析（转载）

https://zhuanlan.zhihu.com/p/25053311

YOLO是基于深度学习的端到端的实时目标检测系统。与大部分目标检测与识别方法（比如Fast R-CNN）将目标识别任务分类目标区域预测和类别预测等多个流程不同，YOLO将目标区域预测和目标类别预测整合于单个神经网络模型中，实现在准确率较高的情况下快速目标检测与识别，更加适合现场应用环境。详情请参见：YOLO：实时快速目标检测；YOLO升级版：YOLOv2和YOLO9000解析。本文将对YOLO的tensorflow实现代码进行详解。本文使用的YOLO源码来源于hizhangp/yolo_tensorflow。

本文结构如下：一，YOLO代码概况；二，train解析；三，test概括；四，总结

1 YOLO代码概况

源代码文件构成如图1-1所示。train.py为训练代码，test.py为测试代码，其它文件夹内的代码为设定参数，建立网络，读取数据等辅助代码。

图1-1 YOLO源代码文件夹

2 train解析

从main()方法，首先读取参数；其次建立YOLONet；然后读取训练数据；最后进行训练。

2.1 建立YOLONet

YOLONet的建立是通过 yolo文件夹中的yolo_net.py文件的代码实现了。yolo_net.py定义了YOLONet类，该类包含了网络初始化（__init__()），建立网络（build_networks()）和loss函数（loss_layer（））等方法。

网络的所有初始化参数包含于__init__()方法之中。

def __init__(self, phase):

self.weights_file = cfg.WEIGHTS_FILE#权重文件

self.classes = cfg.CLASSES#类别

self.num_class = len(self.classes)#类别数量，值为20

self.image_size = cfg.IMAGE_SIZE#图像尺寸,值为448

self.cell_size = cfg.CELL_SIZE#cell尺寸，值为7

self.boxes_per_cell = cfg.BOXES_PER_CELL#每个grid cell负责的boxes，默认为2

self.output_size = (self.cell_size * self.cell_size) * \

(self.num_class + self.boxes_per_cell * 5)#输出尺寸

self.scale = 1.0 * self.image_size / self.cell_size

self.boundary1 = self.cell_size * self.cell_size * self.num_class#7×7×20

self.boundary2 = self.boundary1 + self.cell_size * \

self.cell_size * self.boxes_per_cell#7×7×20+7×7×2

self.object_scale = cfg.OBJECT_SCALE#值为1

self.noobject_scale = cfg.NOOBJECT_SCALE#值为1

self.class_scale = cfg.CLASS_SCALE#值为2.0

self.coord_scale = cfg.COORD_SCALE#值为5.0

self.learning_rate = cfg.LEARNING_RATE#学习速率LEARNING_RATE = 0.0001

self.batch_size = cfg.BATCH_SIZE#BATCH_SIZE = 45

self.alpha = cfg.ALPHA#ALPHA = 0.1

self.disp_console = cfg.DISP_CONSOLE#DISP_CONSOLE = False

self.phase = phase#train or test

self.collection = []#用于储存网络参数

self.offset = np.transpose(np.reshape(np.array(

[np.arange(self.cell_size)] * self.cell_size * self.boxes_per_cell),

(self.boxes_per_cell, self.cell_size, self.cell_size)), (1, 2, 0))#偏置

self.build_networks()

网络建立是通过build_networks()方法实现的，网络由卷积层-pooling层和全连接层组成，详细结构请参见源代码和YOLO：实时快速目标检测。网络接受输入维度为([None, 448, 448, 3])，输出维度为([None,1470])。

loss函数代码的关键，loss函数定义为：

（参加：YOLO：实时快速目标检测）

loss函数是通过loss_layer()实现，代码注释对各个变量的shape进行了注释，结果如下。

计算iou

def calc_iou(self, boxes1, boxes2):

"""calculate ious

Args:

boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ====> (x_center, y_center, w, h)

boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)

Return:

iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

"""

boxes1 = tf.pack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,

boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,

boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,

boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0])

boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])

boxes2 = tf.pack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,

boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,

boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,

boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2])

boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])

# calculate the left up point & right down point

lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])

rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])

# intersection

intersection = tf.maximum(0.0, rd - lu)

inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]

# calculate the boxs1 square and boxs2 square

square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \

(boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])

square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \

(boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])

union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)

return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)

#loss函数

#idx=33,predicts为fc_32，labels shape为(45, 7, 7, 25)

#self.loss = self.loss_layer(33, self.fc_32, self.labels)

def loss_layer(self, idx, predicts, labels):

#将网络输出分离为类别和定位以及box大小，输出维度为7*7*20+7*7*2+7*7*2*4=1470

#类别，shape为(45, 7, 7, 20)

predict_classes = tf.reshape(predicts[:, :self.boundary1],

[self.batch_size, self.cell_size, self.cell_size, self.num_class])

#定位，shape为(45, 7, 7, 2)

predict_scales = tf.reshape(predicts[:, self.boundary1:self.boundary2],

[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell])

##box大小，长宽等 shape为(45, 7, 7, 2, 4)

predict_boxes = tf.reshape(predicts[:, self.boundary2:],

[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell, 4])

#label的类别结果，shape为(45, 7, 7, 1)

response = tf.reshape(labels[:, :, :, 0],

[self.batch_size, self.cell_size, self.cell_size, 1])

#label的定位结果，shape为(45, 7, 7, 1, 4)

boxes = tf.reshape(labels[:, :, :, 1:5],

[self.batch_size, self.cell_size, self.cell_size, 1, 4])

#label的大小结果，shapewei (45, 7, 7, 2, 4)

boxes = tf.tile(boxes, [1, 1, 1, self.boxes_per_cell, 1]) / self.image_size

#shape 为(45, 7, 7, 20)

classes = labels[:, :, :, 5:]

#offset shape为(7, 7, 2)

offset = tf.constant(self.offset, dtype=tf.float32)

#shape为 (1,7, 7, 2)

offset = tf.reshape(offset,

[1, self.cell_size, self.cell_size, self.boxes_per_cell])

#shape为(45, 7, 7, 2)

offset = tf.tile(offset, [self.batch_size, 1, 1, 1])

#shape为(4, 45, 7, 7, 2)

predict_boxes_tran = tf.pack([(predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,

(predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,

tf.square(predict_boxes[:, :, :, :, 2]),

tf.square(predict_boxes[:, :, :, :, 3])])

#shape为(45, 7, 7, 2, 4)

predict_boxes_tran = tf.transpose(predict_boxes_tran, [1, 2, 3, 4, 0])

#shape为(45, 7, 7, 2)

iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)

# calculate I tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

#shape为 (45, 7, 7, 1)

object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)

#shape为(45, 7, 7, 2)

object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response

# mask = tf.tile(response, [1, 1, 1, self.boxes_per_cell])

# calculate no_I tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

#shape为(45, 7, 7, 2)

noobject_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask

#shape为(4, 45, 7, 7, 2)

boxes_tran = tf.pack([boxes[:, :, :, :, 0] * self.cell_size - offset,

boxes[:, :, :, :, 1] * self.cell_size - tf.transpose(offset, (0, 2, 1, 3)),

tf.sqrt(boxes[:, :, :, :, 2]),

tf.sqrt(boxes[:, :, :, :, 3])])

(45, 7, 7, 2, 4)

boxes_tran = tf.transpose(boxes_tran, [1, 2, 3, 4, 0])

# class_loss

class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(response * (predict_classes - classes)),

reduction_indices=[1, 2, 3]), name='class_loss') * self.class_scale

# object_loss

object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_mask * (predict_scales - iou_predict_truth)),

reduction_indices=[1, 2, 3]), name='object_loss') * self.object_scale

# noobject_loss

noobject_loss = tf.reduce_mean(tf.reduce_sum(tf.square(noobject_mask * predict_scales),

reduction_indices=[1, 2, 3]), name='noobject_loss') * self.noobject_scale

# coord_loss

#shape 为 (45, 7, 7, 2, 1)

coord_mask = tf.expand_dims(object_mask, 4)

#shape为(45, 7, 7, 2, 4)

boxes_delta = coord_mask * (predict_boxes - boxes_tran)

coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta),

reduction_indices=[1, 2, 3, 4]), name='coord_loss') * self.coord_scale

tf.summary.scalar(self.phase + '/class_loss', class_loss)

tf.summary.scalar(self.phase + '/object_loss', object_loss)

tf.summary.scalar(self.phase + '/noobject_loss', noobject_loss)

tf.summary.scalar(self.phase + '/coord_loss', coord_loss)

tf.summary.histogram(self.phase + '/boxes_delta_x', boxes_delta[:, :, :, :, 0])

tf.summary.histogram(self.phase + '/boxes_delta_y', boxes_delta[:, :, :, :, 1])

tf.summary.histogram(self.phase + '/boxes_delta_w', boxes_delta[:, :, :, :, 2])

tf.summary.histogram(self.phase + '/boxes_delta_h', boxes_delta[:, :, :, :, 3])

tf.summary.histogram(self.phase + '/iou', iou_predict_truth)

return class_loss + object_loss + noobject_loss + coord_loss

2.2 读取数据

通过utils文件夹中的pascal_voc.py文件读取数据。

2.3 训练

模型训练包含于train()方法之中。训练部分只需看懂了初始化参数，整个结构就很清晰了。值得注意的地方是在训练过程中，对变量采用了数平均数（exponential moving average (EMA)）来提高训练性能，详情见代码注释。同时，运行train.py时，建议将batch_size改小一些（原参数batch size为45,第一次运行没注意，死机了）。

计算iou

def calc_iou(self, boxes1, boxes2):

"""calculate ious

Args:

boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ====> (x_center, y_center, w, h)

boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)

Return:

iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

"""

boxes1 = tf.pack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,

boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,

boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,

boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0])

boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])

boxes2 = tf.pack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,

boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,

boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,

boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2])

boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])

# calculate the left up point & right down point

lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])

rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])

# intersection

intersection = tf.maximum(0.0, rd - lu)

inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]

# calculate the boxs1 square and boxs2 square

square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \

(boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])

square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \

(boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])

union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)

return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)

#loss函数

#idx=33,predicts为fc_32，labels shape为(45, 7, 7, 25)

#self.loss = self.loss_layer(33, self.fc_32, self.labels)

def loss_layer(self, idx, predicts, labels):

#将网络输出分离为类别和定位以及box大小，输出维度为7*7*20+7*7*2+7*7*2*4=1470

#类别，shape为(45, 7, 7, 20)

predict_classes = tf.reshape(predicts[:, :self.boundary1],

[self.batch_size, self.cell_size, self.cell_size, self.num_class])

#定位，shape为(45, 7, 7, 2)

predict_scales = tf.reshape(predicts[:, self.boundary1:self.boundary2],

[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell])

##box大小，长宽等 shape为(45, 7, 7, 2, 4)

predict_boxes = tf.reshape(predicts[:, self.boundary2:],

[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell, 4])

#label的类别结果，shape为(45, 7, 7, 1)

response = tf.reshape(labels[:, :, :, 0],

[self.batch_size, self.cell_size, self.cell_size, 1])

#label的定位结果，shape为(45, 7, 7, 1, 4)

boxes = tf.reshape(labels[:, :, :, 1:5],

[self.batch_size, self.cell_size, self.cell_size, 1, 4])

#label的大小结果，shapewei (45, 7, 7, 2, 4)

boxes = tf.tile(boxes, [1, 1, 1, self.boxes_per_cell, 1]) / self.image_size

#shape 为(45, 7, 7, 20)

classes = labels[:, :, :, 5:]

#offset shape为(7, 7, 2)

offset = tf.constant(self.offset, dtype=tf.float32)

#shape为 (1,7, 7, 2)

offset = tf.reshape(offset,

[1, self.cell_size, self.cell_size, self.boxes_per_cell])

#shape为(45, 7, 7, 2)

offset = tf.tile(offset, [self.batch_size, 1, 1, 1])

#shape为(4, 45, 7, 7, 2)

predict_boxes_tran = tf.pack([(predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,

(predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,

tf.square(predict_boxes[:, :, :, :, 2]),

tf.square(predict_boxes[:, :, :, :, 3])])

#shape为(45, 7, 7, 2, 4)

predict_boxes_tran = tf.transpose(predict_boxes_tran, [1, 2, 3, 4, 0])

#shape为(45, 7, 7, 2)

iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)

# calculate I tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

#shape为 (45, 7, 7, 1)

object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)

#shape为(45, 7, 7, 2)

object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response

# mask = tf.tile(response, [1, 1, 1, self.boxes_per_cell])

# calculate no_I tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

#shape为(45, 7, 7, 2)

noobject_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask

#shape为(4, 45, 7, 7, 2)

boxes_tran = tf.pack([boxes[:, :, :, :, 0] * self.cell_size - offset,

boxes[:, :, :, :, 1] * self.cell_size - tf.transpose(offset, (0, 2, 1, 3)),

tf.sqrt(boxes[:, :, :, :, 2]),

tf.sqrt(boxes[:, :, :, :, 3])])

(45, 7, 7, 2, 4)

boxes_tran = tf.transpose(boxes_tran, [1, 2, 3, 4, 0])

# class_loss

class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(response * (predict_classes - classes)),

reduction_indices=[1, 2, 3]), name='class_loss') * self.class_scale

# object_loss

object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_mask * (predict_scales - iou_predict_truth)),

reduction_indices=[1, 2, 3]), name='object_loss') * self.object_scale

# noobject_loss

noobject_loss = tf.reduce_mean(tf.reduce_sum(tf.square(noobject_mask * predict_scales),

reduction_indices=[1, 2, 3]), name='noobject_loss') * self.noobject_scale

# coord_loss

#shape 为 (45, 7, 7, 2, 1)

coord_mask = tf.expand_dims(object_mask, 4)

#shape为(45, 7, 7, 2, 4)

boxes_delta = coord_mask * (predict_boxes - boxes_tran)

coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta),

reduction_indices=[1, 2, 3, 4]), name='coord_loss') * self.coord_scale

tf.summary.scalar(self.phase + '/class_loss', class_loss)

tf.summary.scalar(self.phase + '/object_loss', object_loss)

tf.summary.scalar(self.phase + '/noobject_loss', noobject_loss)

tf.summary.scalar(self.phase + '/coord_loss', coord_loss)

tf.summary.histogram(self.phase + '/boxes_delta_x', boxes_delta[:, :, :, :, 0])

tf.summary.histogram(self.phase + '/boxes_delta_y', boxes_delta[:, :, :, :, 1])

tf.summary.histogram(self.phase + '/boxes_delta_w', boxes_delta[:, :, :, :, 2])

tf.summary.histogram(self.phase + '/boxes_delta_h', boxes_delta[:, :, :, :, 3])

tf.summary.histogram(self.phase + '/iou', iou_predict_truth)

return class_loss + object_loss + noobject_loss + coord_loss

3 test概括

test.py完成读取训练好的网络权重，检测目标，并画出目标所在位置。代码和训练部分类似，略过。要运行test，首先需要下载文章原作者训练好的模型YOLO_small(貌似需要翻墙)。其次，源代码中有一处小bug，直接运行会报错。

net_output=self.sess.run(self.net.fc_32,feed_dict={self.net.images:inputs})

需要改为

net_output=self.sess.run(self.net.fc_32,feed_dict={self.net.x:inputs})

运行结果如图3-1所示，可以看出YOLO能成功识别人和狗，却识别不了马，作者在后续的文章中对YOLO进行了，使之能识别更多的种类，详见YOLO升级版：YOLOv2和YOLO9000解析。

图3-1 YOLO检测结果一

或者其它图片，如图3-2所示：

图3-2 YOLO检测结果二

4 总结

YOLO是基于深度学习的端到端的实时目标检测系统，主要的特点是速度非常快，同时还有继续提升精确度的潜力。本文对YOLO的tensorflow实现代码进行了详解，该代码在理解了文章后就很简单。其中涉及到的tensorflow知识有以下几点：

一，tf.get_variable 和tf.Variable的差异。差异点点是，前者拥有一个变量检查机制，会检测已经存在的变量是否设置为共享变量，如果已经存在的变量没有设置为共享变量，TensorFlow 运行到第二个拥有相同名字的变量的时候，就会报错。

二，学习速率延迟的实现。

tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None) #decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)

三，采用指数平均数（exponential moving average (EMA)）提高梯度下降（exponential moving average (EMA)）训练方法的效果。

self.ema = tf.train.ExponentialMovingAverage(decay=0.9999)

self.averages_op = self.ema.apply(tf.trainable_variables())

with tf.control_dependencies([self.optimizer]):

self.train_op = tf.group(self.averages_op)

四，tf.pack()函数。

tf.pack(values, name=textquotesingle {}packtextquotesingle {})

#该函数的功能等同于np.asarray

tf.pack([x, y, z]) = np.asarray([x, y, z])

五，tf.tile（）函数。该函数在某一维度上进行复制。

tf.tile(input,multiples,name=None)