美文网首页
车辆检测的实现

车辆检测的实现

作者: 此间不留白 | 来源:发表于2019-12-03 22:14 被阅读0次

前言

在之前的学习中,已经学习过有关目标检测相关内容了,本篇文章将会利用keras深度学习框架和YOLO算法搭建一个车辆检测的应用。整个实现过程可以分为以下两个步骤:

  • 在训练数据集上运行目标检测算法
  • 处理边界框
    在整个算法实现之前,首先需要导入相关库,具体如下所示:
import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body

%matplotlib inline

车辆检测问题

对于自动驾驶问题,首要任务是车辆的检测和识别,为了训练一个能给检测车辆的神经网络,首先需要收集车辆数据。手动收集数据会花费大量的精力,幸运的是,drive.ai(一家专注于自动驾驶的AI公司)提供了这些数据。一个标注过的数据如下图所示:

如果想要YOLO算法识别80个类别,要么选择整数1-80代表这些列表,要么使用一个长度为80的向量表示,类别标签所应的向量元素为1,其余为0。本次作业实现中,可能这两种方法都会用到,具体取决于哪种方法更加方便。

此外,为了实现目标检测,需要构建YOLO模型,YOLO模型的计算会耗费大量的资源,为了提高训练速度,可以加载预训练权重模型。

YOLO

YOLO(“you only look once”)是一个流行的目标检测算法,不仅能够实现实时检测并且拥有很高的精确度。在图像中“only look once”意味着在网络中仅需要一次前向传播就可以做出预测,经过非极大值抑制之后,就可以直接输出目标的边界信息。

模型细节

首先需要了解:

  • 输入数据是一组图像数据,其大小为(m,608,608,3)
  • 输出数据是一个识别类别边界信息的列表,每一个边界信息都有6个数据(p_c,b_x,b_y,b_h,b_w,c),如果将类别c扩展成一个80维的向量,每个边界框信息都会有85个数据。

本次实现车辆检测,将会使用5个anchor box,这个YOLO模型的结构为:IMAGE(m,6088,608,3)->DEEP CNN->ENCODING(m,19,19,5,85)整个模型的结构图,如下图所示:

因为使用了5个anchor box,每个19*19的格子都会编码5个box的信息,而anchor box的大小由其高和宽决定。

为了简化输出,深度卷积的输出将会由(19,19,5,85)的形式变成(19,19,425)。如下图所示:


对于每个格子的输出,将会计算对应类别与其概率的乘积,并认为乘积的最大值(可以认为该值是score)就是检测到的类别。如下图所示:


为了可视化的显示YOLO算法的输出,有以下方法:

  • 对于每一个19*19的格子,首先找到最大的score值
  • 可以为需要检测的每一种目标设置一对应颜色,根据score值向对应的格子着色。
    如下图所示:


另外一种可视化显示YOLO算法输出的方式是为绘制YOLO算法输出的bounding box,如下图所示:

如上图所示,尽管绘制了socre最大的边框,但是仍然很难确定检测的目标,为此,可以使用非极大值抑制算法来过滤这些输出。具体而言,有以下步骤:

  • 消除score较小的边框
  • 当检测相同目标时,如果多个边界框彼此重叠时,仅选择一个边界框。

在score上进行阈值过滤

具体来说就是,当的类别score小于指定的阈值时,可以消除该边界框。

模型给定的输出是19* 19* 5* 85,为了方便运算,每一个边界框都是85个数字决定的,为了计算方便,可以将这些数据定义为一下变量:

  • box_confidence: (19* 19,5,1)的张量,19*19个格子预测的5个框中每个框的概率p_c.
  • boxes: 是一个(19*19,4,1)的张量,包含了5个框中每个框的信息(b_x,b_y,b_w,b_h)
  • box_class_provs:(19* 19 ,5,80)的张量,包含了5个框中每个框可能检测到的类型c_1,c_2,…c_{80}

根据以上介绍,整个算法的如下所示:


def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
    """
    参数:zhang
    box_confidence -- (19, 19, 5, 1)的张量
    boxes --  (19, 19, 5, 4)的张量
    box_class_probs --  (19, 19, 5, 80)的张量
    threshold -- 实数,如果类别的最大概率小于该值,则舍去
    
    返回值:
    scores -张量 (None,), 包含了所选边框的每个类别的概率值
    boxes -- 张量 (None, 4), 包含了所选边框的信息 (b_x, b_y, b_h, b_w) 
   classes -- 张量(None,), 包含了所选框所包含的类别索引
    
    注意: "None" 表示提取的box的个数未知, 因为其个数取决于 threshold. 
   
    """
    
    # Step 1: 计算box scores

    box_scores = box_confidence*box_class_probs

    # Step 2: 寻找最大box_score的类别
   
    box_classes = K.argmax(box_scores,axis=-1)  #最大值的索引
    box_class_scores = K.max(box_scores,axis=-1) #最大值
   
    # Step 3: 通过使用 "threshold"基于 "box_class_scores"创建一个过滤mask . 并且与box_class_scores有相同的维数, 并且(probability >= threshold)是为真
 
    filtering_mask = (box_class_scores>=threshold)
 
    # Step 4: 在 scores, boxes and classes应用mask
  
    scores = tf.boolean_mask(box_class_scores,filtering_mask)
    boxes = tf.boolean_mask(boxes,filtering_mask)
    classes = tf.boolean_mask(box_classes,filtering_mask)
    
    
    return scores, boxes, classes

非极大值抑制

以上通过阈值过滤的方法处理了相互重叠的边框,而另外一种通过过滤器的方法得到正确边框的方法称之为非极大值抑制法。如下图所示,模型的与预测结果显示有3辆车,事实上,这3个结果都表示了同一辆车,运行非极大值抑制算法,会选择预测概率最高的结果。


非极大值抑制方法使用了一个重要的算法,最大交并比,如下图所示:


交并比算法的基本实现思路如下所示:

  • 首先,用4个参数定义一个边框,这四个参数表示边框左上角,右下角的坐标,用(x_1,y_1,x_2,y_2)表示。
  • 为了计算矩形边框的面积,计算方式为高(y_2-y_1)*宽(x_2-x_1)
  • 还需要计算连个边框的交点坐标(x_{i1},y_{i1},x_{i2},y_{i2}),这些坐标的定义方式如下所示:
    • x_{i1}两个边框横坐标的最大值;
    • y_{i1}两个边框纵坐标的最大值;
    • x_{i2}两个边框横坐标的最小值;
    • y_{i2}两个边框纵坐标的最小值。

综上所述,最大交并比算法的实现代码如下所示:


def iou(box1, box2):
    """
    实现交并比的计算
    参数:
    box1 -- 第一个边框, 用列表对象表示坐标 (x1, y1, x2, y2)
    box2 -- 第二个边框,用列表对象表示坐标 (x1, y1, x2, y2)
    """

    # 计算两个边框相交的面积
  
    xi1 = max(box1[0],box2[0])
    yi1 = max(box1[1],box2[1])
    xi2 = min(box1[2],box2[2])
    yi2 = min(box1[3],box2[3])
    inter_area = abs(xi1-xi2)*abs(yi1-yi2)
   

    # 计算两个边框相并的面积
  # A+B-AUB
    box1_area = abs(box1[0]-box1[2])*abs(box1[1]-box1[3])
    box2_area = abs(box2[0]-box2[2])*abs(box2[1]-box2[3])
    union_area = box1_area+box2_area-inter_area

    # 计算交并比
   
    iou = inter_area/union_area
    return iou

交并比计算的实现完成之后,可以利用交并比设置阈值,实现非极大值抑制算法了,该算法的实现步骤如下:

  • 选择最高score(对应类别与其概率的乘积)的边框;
  • 计算该边框与其他边框的重叠,并删除与其重叠超过交并比阈值的边框;
  • 继续返回第一步进行迭代,直到没有边框的score低于当前所选边框。

综上,非极大值抑制算法的实现代码如下所示:

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    """
    用tensorflow实现非极大值抑制算法
    参数:
    scores -- 张量 yolo_filter_boxes()的输出
    boxes --  (None, 4)张量, yolo_filter_boxes() 的输出
    classes -- 张量 (None,),  yolo_filter_boxes()的输出
    max_boxes -- 整数, 预测得边框的最大数量
    iou_threshold -- 实数, 交并比阈值
    返回值:
    scores -- 张量 (, None),每个边框所的score
    boxes -- 张量(4, None), 预测的边框坐标
    classes --张量 (, None), 每个边框的预测的类别
  
    """
    
    max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()
    K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # 初始化max_tensor_box
    
    # 利用tensorflow函数获得想要保留的边框指标
 
    nms_indices = tf.image.non_max_suppression(boxes, scores,max_boxes,iou_threshold)

    
    # 使用K.gather() 选择非极大值抑制算法获得的对应指标
 
    scores = K.gather(scores,nms_indices)
    boxes = K.gather(boxes,nms_indices)
    classes = K.gather(classes,nms_indices)
 
    return scores, boxes, classes

封装过滤器

现在,可以将深度CNN网络的输出和上述步骤实现过的过滤器封装在一起了。具体如下所示:

在YOLO算法的实现中,可以用不同的方式表示边框,如边框的角,边框的中点和边框的高度和宽度之比,在不同的环境中可以转换不同的表示方法,这种边框的表示方法可以用如下提供的函数转化:
boxes = yolo_boxes_to_corners(box_xy, box_wh)
以上函数能够将边框高和宽的坐标转换为边框顶点的坐标。

YOLO的网络在608*608的图片上训练,如果需要在不同大小的图片上训练,可以重新调整图片的大小,调整图片大小的函数如下所示:
boxes = scale_boxes(boxes, image_shape)

综上所述,封装整个算法的实现代码如下所示:

def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
    """
   参数:
    yolo_outputs -- 模型的输出(for image_shape of (608, 608, 3)), 包含4个张量
                    box_confidence: 张量,大小 (None, 19, 19, 5, 1)
                    box_xy: 张量,大小 (None, 19, 19, 5, 2)
                    box_wh: 张量,大小(None, 19, 19, 5, 2)
                    box_class_probs:张量,大小(None, 19, 19, 5, 80)
    image_shape -- shape (2,)的张量 表示输入图像的大小, 用 (608., 608.) 表示
    max_boxes -- 整数 预测得到的最大整数
    score_threshold -- 实数,如果 [ highest class probability score < threshold], 删除对应边框
    iou_threshold -- 实数, 非极大抑制算法用到的交并比阈值
    
    返回值:
    scores -- 张量,大小为 (None, ),每个边框预测得到的score
    boxes --张量,大小为 (None, 4),预测得到的边框坐标
    classes -- 张量,大小为shape (None,), 每个边框预测的类别
    """

    # YOLO模型的输出转换
    box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs;

    #将边框转换为过滤器输入需要的形式
    boxes =  yolo_boxes_to_corners(box_xy,box_wh)

    # 利用阈值score实现score过滤
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)
    
    #调整原来图像的大小
    boxes = scale_boxes(boxes, image_shape)

    #使用交并比阈值实现非极大值抑制算法
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)
    
    return scores, boxes, classes

输入图像的大小是(608,608,3),通过CNN网络的输出之后,结果是(19,19,5,85),最后两层的flattening之后,输出维数变为(19,19,425), 每个19*19的小格子都有5个边框输出,对应着5个anchor box,85中包含每个边框的宽度和概率值(p_c,b_x,b_y,b_h,b_w),而80代表的检测到的类别数量。通过score阈值方法检测边框的类别,而利用非极大值抑制算法去除重叠的边框。

在图像上测试YOLO预训练模型

本部分内容,将会使用一个预训练模型在车辆检测数据集上进行测试,首先,创建一个session,如下所示:

sess = K.get_session()

定义类别,anchor数量和图像形状

在“colo_classes.txt”和“yolo_abchors.txt”中,已经收集到了80个类别和5个边框的信息,加载这些信息如下代码所示:

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
image_shape = (720., 1280.)    

加载预训练模型

训练一个YOLO模型,需要花费大量的时间,收集大量的标注数据,可以加载一个存储在yolo.h5的预训练模型以加速这个过程。实现日安如如下所示:


yolo_model = load_model("model_data/yolo.h5")

利用如下代码输出这个模型的概要信息如下所示:

yolo_model,summary()
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 608, 608, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 608, 608, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 608, 608, 32) 128         conv2d_1[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 608, 608, 32) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 304, 304, 32) 0           leaky_re_lu_1[0][0]              
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 304, 304, 64) 18432       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 304, 304, 64) 256         conv2d_2[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 304, 304, 64) 0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 152, 152, 64) 0           leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 152, 152, 128 73728       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 152, 152, 128 512         conv2d_3[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 152, 152, 64) 8192        leaky_re_lu_3[0][0]              
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 152, 152, 64) 256         conv2d_4[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 152, 152, 64) 0           batch_normalization_4[0][0]      
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 152, 152, 128 73728       leaky_re_lu_4[0][0]              
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 152, 152, 128 512         conv2d_5[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 76, 76, 128)  0           leaky_re_lu_5[0][0]              
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 76, 76, 256)  294912      max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 76, 76, 256)  1024        conv2d_6[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_6[0][0]      
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 76, 76, 128)  32768       leaky_re_lu_6[0][0]              
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 76, 76, 128)  512         conv2d_7[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU)       (None, 76, 76, 128)  0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 76, 76, 256)  294912      leaky_re_lu_7[0][0]              
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 76, 76, 256)  1024        conv2d_8[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_8[0][0]      
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 38, 38, 256)  0           leaky_re_lu_8[0][0]              
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 38, 38, 512)  1179648     max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 38, 38, 512)  2048        conv2d_9[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU)       (None, 38, 38, 512)  0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_9[0][0]              
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 38, 38, 256)  1024        conv2d_10[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_10[0][0]     
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_10[0][0]             
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 38, 38, 512)  2048        conv2d_11[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_11[0][0]     
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_11[0][0]             
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 38, 38, 256)  1024        conv2d_12[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_12[0][0]     
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_12[0][0]             
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 38, 38, 512)  2048        conv2d_13[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_13[0][0]     
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 19, 19, 512)  0           leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 19, 19, 1024) 4718592     max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 19, 19, 1024) 4096        conv2d_14[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_14[0][0]     
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_14[0][0]             
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 19, 19, 512)  2048        conv2d_15[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_15[0][0]     
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_15[0][0]             
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 19, 19, 1024) 4096        conv2d_16[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_16[0][0]     
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_16[0][0]             
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 19, 19, 512)  2048        conv2d_17[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_17[0][0]     
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_17[0][0]             
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 19, 19, 1024) 4096        conv2d_18[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_18[0][0]     
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_18[0][0]             
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 19, 19, 1024) 4096        conv2d_19[0][0]                  
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 38, 38, 64)   32768       leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_19 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_19[0][0]     
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 38, 38, 64)   256         conv2d_21[0][0]                  
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_19[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU)      (None, 38, 38, 64)   0           batch_normalization_21[0][0]     
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 19, 19, 1024) 4096        conv2d_20[0][0]                  
__________________________________________________________________________________________________
space_to_depth_x2 (Lambda)      (None, 19, 19, 256)  0           leaky_re_lu_21[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_20[0][0]     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 19, 19, 1280) 0           space_to_depth_x2[0][0]          
                                                                 leaky_re_lu_20[0][0]             
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 19, 19, 1024) 11796480    concatenate_1[0][0]              
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 19, 19, 1024) 4096        conv2d_22[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_22[0][0]     
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 19, 19, 425)  435625      leaky_re_lu_22[0][0]             
======================================================================================================
Total params: 50,983,561
Trainable params: 50,962,889
Non-trainable params: 20,672

将模型输出转换为可用的边界框张量

yolo_model的输出是一个(m,19,19,5,85)的张量,需要转换为过滤边界框函数所需要的输入形式,如下所示:


yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))

过滤边界框

利用已经实现的边界框过滤函数,将以上模型的输出进行选择和过滤的代码如下所示:

scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

在测试图像上运行模型

在测试图像上运行YOLO模型,需要运行一个TensorFlow的session,预测函数的实现,如下代码所示:

def predict(sess, image_file):
    """  
   参数:
    sess --  包含着YOLO graph的 tensorflow/Keras 的session
    image_file -- 测试图像的文件名
    返回值:
    out_scores -- 张量 (None, ),预测得到的边框 score
    out_boxes --张量 (None, 4), 预测之后,边框的坐标信息
    out_classes --张量, 预测边框的类别索引
    

    """

    # 处理图像
    image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))

   # 运行session,并在字典中选择placeholder
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict = {yolo_model.input:image_data, K.learning_phase(): 0})
    # 打印预测信息
    print('Found {} boxes for {}'.format(len(out_boxes), image_file))
    # 生成绘制边界框的颜色.
    colors = generate_colors(class_names)
    # 在图像文件中绘制边界框
    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    # 在图像中保存预测得到的边界框信息
    image.save(os.path.join("out", image_file), quality=90)
    # 显示结果
    output_image = scipy.misc.imread(os.path.join("out", image_file))
    imshow(output_image)
    
    return out_scores, out_boxes, out_classes

最后,输出信息如下:


作业中提供了120张图片,利用YOLO模型,可以全部绘制出来,如下代码所示:

for i in range(1,121):
    
    #计算需要在前面填充几个0
    num_fill = int( len("0000") - len(str(1))) + 1
    #对索引进行填充
    filename = str(i).zfill(num_fill) + ".jpg"
    print("当前文件:" + str(filename))
    
    #开始绘制,不打印信息,不绘制图
    out_scores, out_boxes, out_classes = predict(sess, filename,is_show_info=False,is_plot=False)
    
    

print("绘制完成!")

最后,将这些图片,为了方便显示,可以做成GIF动画,如下所示:


相关文章

网友评论

      本文标题:车辆检测的实现

      本文链接:https://www.haomeiwen.com/subject/inifgctx.html