YOLO V3的作者开源的源码是C++的,虽然对我来说是可以看懂,但是鉴于作者天马行空的代码风格,感觉不是特别适合解读。
另外,网上很多都是Keras的代码,tf实现的不是很多。
有个比较好的是一个实现是这个: https://github.com/mystic123/tensorflow-yolo-v3
不过这个仅仅是推断,没设计训练。不过先拿来看看。
CSDN已经有人翻译出来了,真是有人比较空闲啊 哈哈 : https://blog.csdn.net/haoqimao_hard/article/details/82109015
不过一直对detection_layer时bounding box的处理不是很了解,这里着重解读了这块代码,添加了注释:
def _detection_layer(inputs, num_classes, anchors, img_size, data_format):
num_anchors = len(anchors)
# 这里通道数里的 5 = 2(box_centers) + 2(box_sizes) + 1(confidence)
predictions = slim.conv2d(inputs, num_anchors * (5 + num_classes), 1,
stride=1, normalizer_fn=None,
activation_fn=None,
biases_initializer=tf.zeros_initializer())
# 获取特征图的维度
shape = predictions.get_shape().as_list()
grid_size = _get_size(shape, data_format)
dim = grid_size[0] * grid_size[1]
bbox_attrs = 5 + num_classes # bounding boxes的维度
if data_format == 'NCHW':
predictions = tf.reshape(
predictions, [-1, num_anchors * bbox_attrs, dim])
predictions = tf.transpose(predictions, [0, 2, 1])
# reshape, 把bounding boxes的维度放到最后
predictions = tf.reshape(predictions, [-1, num_anchors * dim, bbox_attrs])
# 计算特征图对于原始图的缩放比例
stride = (img_size[0] // grid_size[0], img_size[1] // grid_size[1])
# 按照缩放比例把先验boxes的坐标变换到特征图的坐标
anchors = [(a[0] / stride[0], a[1] / stride[1]) for a in anchors]
# 把中心坐标,宽高 置信度 类别从预测中分离出来
box_centers, box_sizes, confidence, classes = tf.split(
predictions, [2, 2, 1, num_classes], axis=-1)
# 先用sigmoid归一化到0-1
box_centers = tf.nn.sigmoid(box_centers)
confidence = tf.nn.sigmoid(confidence)
#这里按照特征图大小生成网格坐标,并经过变形处理和box_centers维度一致
grid_x = tf.range(grid_size[0], dtype=tf.float32)
grid_y = tf.range(grid_size[1], dtype=tf.float32)
a, b = tf.meshgrid(grid_x, grid_y) # 例如: a=[[0,1,2],[0,1,2]] b=[[1,1,1],[2,2,2]] 这是x:0~2, y:1~2的网格坐标
x_offset = tf.reshape(a, (-1, 1)) # x_offset = [0,1,2,0,1,2]
y_offset = tf.reshape(b, (-1, 1)) # y_offset = [1,1,1,2,2,2]
x_y_offset = tf.concat([x_offset, y_offset], axis=-1) # x_y_offset = [[0,1,2,0,1,2],[1,1,1,2,2,2]]
x_y_offset = tf.reshape(tf.tile(x_y_offset, [1, num_anchors]), [1, -1, 2]) # 按照anchors的数目扩展坐标
# 在每个特征图像素上,基于刚才生成的网格坐标,加上预测的中心点坐标
box_centers = box_centers + x_y_offset #论文上的公式 bx = ρ(tx) + cx
box_centers = box_centers * stride # 将坐标缩放到原始图像上
anchors = tf.tile(anchors, [dim, 1]) # 将anchors扩展到和box一样的维度
box_sizes = tf.exp(box_sizes) * anchors #论文上的公式 bw = Pw* exp(tw)
box_sizes = box_sizes * stride
detections = tf.concat([box_centers, box_sizes, confidence], axis=-1)
# 注意这里的激活函数是sigmoid,YOLO V3的改进之一
classes = tf.nn.sigmoid(classes)
predictions = tf.concat([detections, classes], axis=-1)
return predictions
以及非极大值抑制的处理:
def non_max_suppression(predictions_with_boxes, confidence_threshold, iou_threshold=0.4):
"""
Applies Non-max suppression to prediction boxes.
:param predictions_with_boxes: 3D numpy array, first 4 values in 3rd dimension are bbox attrs, 5th is confidence
:param confidence_threshold: the threshold for deciding if prediction is valid
:param iou_threshold: the threshold for deciding if two boxes overlap `
:return: dict: class -> [(box, score)]
"""
# 按照置信度筛选出来需要处理的预测,将低于阈值的设置为0
conf_mask = np.expand_dims(
(predictions_with_boxes[:, :, 4] > confidence_threshold), -1)
predictions = predictions_with_boxes * conf_mask
result = {}
for i, image_pred in enumerate(predictions):
shape = image_pred.shape
non_zero_idxs = np.nonzero(image_pred)
image_pred = image_pred[non_zero_idxs] # 提取出来非零值,也就是之前置信度大于阈值的预测值
image_pred = image_pred.reshape(-1, shape[-1])
bbox_attrs = image_pred[:, :5] # 前5个是 框四个坐标+ 框置信度
classes = image_pred[:, 5:] # 之后的是类别预测的sigmoid值
classes = np.argmax(classes, axis=-1) # 找到最大值的index,就是类别的index
unique_classes = list(set(classes.reshape(-1))) # 去重,获取有几类
for cls in unique_classes: # 对每个预测数来的类别处理预测框
cls_mask = classes == cls
cls_boxes = bbox_attrs[np.nonzero(cls_mask)] # 获取预测的是该类的所有框
cls_boxes = cls_boxes[cls_boxes[:, -1].argsort()[::-1]] # 将框按照置信度排序,从大到小排序
cls_scores = cls_boxes[:, -1] # 框四个坐标
cls_boxes = cls_boxes[:, :-1] # 框的类别置信度
while len(cls_boxes) > 0:
box = cls_boxes[0]
score = cls_scores[0]
if cls not in result:
result[cls] = []
result[cls].append((box, score))
cls_boxes = cls_boxes[1:]
cls_scores = cls_scores[1:]
ious = np.array([_iou(box, x) for x in cls_boxes]) # 计算当前框和剩余框的IOU
iou_mask = ious < iou_threshold
cls_boxes = cls_boxes[np.nonzero(iou_mask)] # 忽略IOU大于阈值的框
cls_scores = cls_scores[np.nonzero(iou_mask)]
return result
刚才说了,上述这个源码不涉及训练,loss的计算也是省略的了。直接加载了YOLO原作者的weigh。
而且我觉得这里的预测框的处理是有问题的。
嗯, 是的,不完整。
有时间继续研究一下完整的。参考这个blog里的实现:
Tensorflow tf 掏粪记录】笔记五——YOLOv3 tensorflow 实现
网友评论