美文网首页
keras-Yolov3 源码调试

keras-Yolov3 源码调试

作者: yanghedada | 来源:发表于2019-02-06 20:02 被阅读6次

    源码地址 https://github.com/qqwweee/keras-yolo3

    春节期间仔细看了看yolov3的kears源码,这个源码毕竟不是作者写的,有点寒酸,可能大道至简也是这么个理。我在看源码的时候,参照了一些博客进行补充,主要是,作者公布的代码有点凌乱,和rcnn不是一个体系的缘故吧。。。。。

    看到大神的优秀博客,感觉自己的笔记有点炒冷饭的味道。。。😂

    1.目录结构:

    如下:这个就是直接从github上down下来的

    .
    ├── coco_annotation.py
    ├── convert.py
    ├── darknet53.cfg
    ├── font
    │   ├── FiraMono-Medium.otf
    │   └── SIL Open Font License.txt
    ├── .gitignore
    ├── kmeans.py
    ├── LICENSE
    ├── model_data
    │   ├── coco_classes.txt
    │   ├── tiny_yolo_anchors.txt
    │   ├── voc_classes.txt
    │   └── yolo_anchors.txt
    ├── README.md
    ├── train_bottleneck.py
    ├── train.py
    ├── voc_annotation.py
    ├── yolo3
    │   ├── __init__.py
    │   ├── model.py
    │   └── utils.py
    ├── yolo.py
    ├── yolov3.cfg
    ├── yolov3-tiny.cfg
    └── yolo_video.py
    
    
    1. font是字体目录
    2. model_data:
      是各个数据库对应的模型的文件:
    • coco_classes文件: 就是coco文件的类别文件
      如下:


    • yolo_anchors文件:就是yolo3所需要的anchors大小
      如下



      这里的两文件可以根据数据不同改变,改成你所需要的类别。而anchors可以通过k-means进行聚类直接获得。

    1. yolo3:
      这里有model.py和utils.py文件。
    • model.py 就是构建yolo3的主要模块文件,这里一共有14个函数/
      如下:
    • utils.py 是在模型训练时进行数据处理的工具文件,一共有3个函数:
    1. *_annoataion.py 对数据进行转换的文件,把原始的文件转换为txt文件。
    • coco_annoataion.py 把json文件转换为txt文件
    • voc_annoataion.py 把xml文件转换为txt
    1. convert.py 把原始权重转换为kares的能读取的原始h5文件
    2. kmeans.py 输入上面得到的txt文件,通过聚类得到数据最佳anchors。
    3. train.py 进行yolov3训练的文件
    4. yolo.py 构建以yolov3为底层构件的yolo检测模型,因为上面的yolov3还是分开的单个函数,功能并没有融合在一起,即使在训练的时候所有的yolov3组件还是分开的功能,并没有统一接口,供在模型训练完成之后,直接使用。通过yolo.py融合所有的组件。
    5. yolo_video.py 使用yolo.py文件中的yolo检测模型,并且对视频中的物体进行检测。
    6. yolov3.cfg 构建yolov3检测模型的整个超参文件。

    在阅读源码的时候主要参考:
    https://github.com/SpikeKing/keras-yolo3-detection的几篇博文,但是为了更好理解keras-yolo3的代码,这几篇博文的对应文件如下:

    kmens.py

    import numpy as np
    
    
    class YOLO_Kmeans:
    
        def __init__(self, cluster_number, filename):
            # 读取kmeans的中心数
            self.cluster_number = cluster_number
            # 标签文件的文件名
            self.filename = "2012_train.txt"
    
        def iou(self, boxes, clusters):  # 1 box -> k clusters
            # boxes : 所有的[width, height]
            # clusters : 9个随机的中心点[width, height]
            n = boxes.shape[0]
            k = self.cluster_number
            
            # 所有的boxes的面积
            box_area = boxes[:, 0] * boxes[:, 1]
            # 将box_area的每个元素重复k次 
            box_area = box_area.repeat(k)
            box_area = np.reshape(box_area, (n, k))
            # 计算9个中点的面积
            cluster_area = clusters[:, 0] * clusters[:, 1]
            # 对cluster_area进行复制n份
            cluster_area = np.tile(cluster_area, [1, n])
            cluster_area = np.reshape(cluster_area, (n, k))
            # 获取box和中心的的交叉w的宽
            box_w_matrix = np.reshape(boxes[:, 0].repeat(k), (n, k))
            cluster_w_matrix = np.reshape(np.tile(clusters[:, 0], (1, n)), (n, k))
            min_w_matrix = np.minimum(cluster_w_matrix, box_w_matrix)
            # 获取box和中心的的交叉w的高
            box_h_matrix = np.reshape(boxes[:, 1].repeat(k), (n, k))
            cluster_h_matrix = np.reshape(np.tile(clusters[:, 1], (1, n)), (n, k))
            min_h_matrix = np.minimum(cluster_h_matrix, box_h_matrix)
            # 交叉点的面积
            inter_area = np.multiply(min_w_matrix, min_h_matrix)
            # 9个交叉点和所有的boxes的iou值
            result = inter_area / (box_area + cluster_area - inter_area)
            return result
    
        def avg_iou(self, boxes, clusters):
            #  计算9个中点与所有的boxes总的iou,n个点的平均iou
            accuracy = np.mean([np.max(self.iou(boxes, clusters), axis=1)])
            return accuracy
    
        def kmeans(self, boxes, k, dist=np.median):
            # np.median 求众数
            # boxes = [宽, 高]C
            # k 中心点数
            box_number = boxes.shape[0]
            distances = np.empty((box_number, k))
            last_nearest = np.zeros((box_number,))
            np.random.seed()
            # 从所有的boxe中选区9个随机中心点
            clusters = boxes[np.random.choice(
                box_number, k, replace=False)]  # init k clusters
            while True:
                # 计算所有的boxes和clusters的值(n,k)
                distances = 1 - self.iou(boxes, clusters)
                # 选取iou值最小的点(n,)
                current_nearest = np.argmin(distances, axis=1)
                # 中心点未改变,跳出
                if (last_nearest == current_nearest).all():
                    break  # clusters won't change
                # 计算每个群组的中心或者众数
                for cluster in range(k):
                    clusters[cluster] = dist(  # update clusters
                        boxes[current_nearest == cluster], axis=0)
                # 改变中心点
                last_nearest = current_nearest
    
            return clusters
    
        def result2txt(self, data):
            # 把9个中心点,写入txt文件
            f = open("yolo_anchors.txt", 'w')
            row = np.shape(data)[0]
            for i in range(row):
                if i == 0:
                    x_y = "%d,%d" % (data[i][0], data[i][1])
                else:
                    x_y = ", %d,%d" % (data[i][0], data[i][1])
                f.write(x_y)
            f.close()
    
        def txt2boxes(self):
            # 打开文件
            f = open(self.filename, 'r')
            dataSet = []
            # 读取文件
            for line in f:
                infos = line.split(" ")
                length = len(infos)
                # infons[0] 为图片的名称
                for i in range(1, length):
                # 获取文件的宽和高
                    width = int(infos[i].split(",")[2]) - \
                        int(infos[i].split(",")[0])
                    height = int(infos[i].split(",")[3]) - \
                        int(infos[i].split(",")[1])
                    dataSet.append([width, height])
            result = np.array(dataSet)
            f.close()
            return result
    
        def txt2clusters(self):
            # 获取所有的文件目标的宽和高,width, height
            all_boxes = self.txt2boxes()
            # result 9个中心点
            result = self.kmeans(all_boxes, k=self.cluster_number)
            # 按最后一列顺序排序
            result = result[np.lexsort(result.T[0, None])]
            # 把结果写入txt文件
            self.result2txt(result)
            print("K anchors:\n {}".format(result))
            #  计算9个中点与所有的boxes总的iou,n个点的平均iou
            print("Accuracy: {:.2f}%".format(
                self.avg_iou(all_boxes, result) * 100))
    
    
    if __name__ == "__main__":
        cluster_number = 9
        filename = "2012_train.txt"
        kmeans = YOLO_Kmeans(cluster_number, filename)
        kmeans.txt2clusters()
    
    

    k-means拿到数据里所有的目标框,得到所有的宽和高,在这里面随机取得9个随即中心,之后以9个点为中心得到9个族,不断计算其他点到中点的距离调整每个点所归属的族和中心,直到9个中心不再变即可。这9个中心的x,y就是整个数据的9个合适的anchors==框的宽和高。

    train.py

    #!/usr/bin/env python
    # -- coding: utf-8 --
    """
    Copyright (c) 2018. All rights reserved.
    Created by C. L. Wang on 2018/7/4
    """
    import os
    import numpy as np
    import tensorflow as tf
    import keras.backend as K
    from keras.backend import mean
    from keras.layers import Input, Lambda
    from keras.models import Model
    from keras.optimizers import Adam
    from keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
    from keras.utils import plot_model
    
    from yolo3.model import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss
    from yolo3.utils import get_random_data
    
    
    def _main():
        import os
        os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
        from keras import backend as K
        config = tf.ConfigProto()
        config.gpu_options.allow_growth = True
        sess = tf.Session(config=config)
        K.set_session(sess)
    
        annotation_path = 'dataset/WIDER_train.txt'  # 数据
        classes_path = 'configs/wider_classes.txt'  # 类别
    
        log_dir = 'logs/004/'  # 日志文件夹
    
        # pretrained_path = 'model_data/yolo_weights.h5'  # 预训练模型
        pretrained_path = 'logs/003/ep074-loss26.535-val_loss27.370.h5'  # 预训练模型
        anchors_path = 'configs/yolo_anchors.txt'  # anchors
    
        class_names = get_classes(classes_path)  # 类别列表
        num_classes = len(class_names)  # 类别数
        anchors = get_anchors(anchors_path)  # anchors列表
    
        input_shape = (416, 416)  # 32的倍数,输入图像
    
        # 创建需要训练的模型
        model = create_model(input_shape, anchors, num_classes,
                             freeze_body=2,
                             weights_path=pretrained_path)  # make sure you know what you freeze
    
        logging = TensorBoard(log_dir=log_dir)
        checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
                                     monitor='val_loss', save_weights_only=True,
                                     save_best_only=True, period=3)  # 只存储weights,
        #reduce_lr:当评价指标不在提升时,减少学习率,每次减少10%,当验证损失值,持续3次未减少时,则终止训练。
    
        #early_stopping:当验证集损失值,连续增加小于0时,持续10个epoch,则终止训练。
        reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)  # 当评价指标不在提升时,减少学习率
        early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)  # 测试集准确率,下降前终止
    
        val_split = 0.1  # 训练和验证的比例
        with open(annotation_path) as f:
            lines = f.readlines()
        np.random.seed(47)
        np.random.shuffle(lines)
        np.random.seed(None)
        num_val = int(len(lines) * val_split)  # 验证集数量
        num_train = len(lines) - num_val  # 训练集数量
    
        """
        把目标当成一个输入,构成多输入模型,把loss写成一个层,作为最后的输出,搭建模型的时候,
        就只需要将模型的output定义为loss,而compile的时候,
        直接将loss设置为y_pred(因为模型的输出就是loss,所以y_pred就是loss),
        无视y_true,训练的时候,y_true随便扔一个符合形状的数组进去就行了。
        """
        if False:
            model.compile(optimizer=Adam(lr=1e-3), loss={
                # 使用定制的 yolo_loss Lambda层
                'yolo_loss': lambda y_true, y_pred: y_pred})  # 损失函数
    
            batch_size = 32  # batch尺寸
            print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
            model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
                                steps_per_epoch=max(1, num_train // batch_size),
                                validation_data=data_generator_wrapper(
                                    lines[num_train:], batch_size, input_shape, anchors, num_classes),
                                validation_steps=max(1, num_val // batch_size),
                                epochs=50,
                                initial_epoch=0,
                                callbacks=[logging, checkpoint])
            model.save_weights(log_dir + 'trained_weights_stage_1.h5')  # 存储最终的参数,再训练过程中,通过回调存储
    
        if True:  # 全部训练
            for i in range(len(model.layers)):
                model.layers[i].trainable = True
    
            model.compile(optimizer=Adam(lr=1e-4),
                          loss={'yolo_loss': lambda y_true, y_pred: y_pred})  # recompile to apply the change
            print('Unfreeze all of the layers.')
    
            batch_size = 16  # note that more GPU memory is required after unfreezing the body
            print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
    
            model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
                                steps_per_epoch=max(1, num_train // batch_size),
                                validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors,
                                                                       num_classes),
                                validation_steps=max(1, num_val // batch_size),
                                epochs=100,
                                initial_epoch=50,
                                callbacks=[logging, checkpoint, reduce_lr, early_stopping])
            model.save_weights(log_dir + 'trained_weights_final.h5')
    
    
    def get_classes(classes_path):
        # 输入类别文件,读取文件中所有的类别,生成list
        '''loads the classes'''
        with open(classes_path) as f:
            class_names = f.readlines()
        class_names = [c.strip() for c in class_names]
        return class_names
    
    
    def get_anchors(anchors_path):
        # 获取所有的anchors的长和宽
        '''loads the anchors from a file'''
        with open(anchors_path) as f:
            anchors = f.readline()
        anchors = [float(x) for x in anchors.split(',')]
        return np.array(anchors).reshape(-1, 2)
    
    
    def create_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2,
                     weights_path='model_data/yolo_weights.h5'):
        K.clear_session()  # 清除session
        h, w = input_shape  # 尺寸
        image_input = Input(shape=(w, h, 3))  # 图片输入格式
        num_anchors = len(anchors)  # anchor数量
    
        # YOLO的三种尺度,每个尺度的anchor数,类别数+边框4个+置信度1
        y_true = [Input(shape=(h // {0: 32, 1: 16, 2: 8}[l], w // {0: 32, 1: 16, 2: 8}[l],
                               num_anchors // 3, num_classes + 5)) for l in range(3)]
    
        model_body = yolo_body(image_input, num_anchors // 3, num_classes)  # model
        print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))
    
        if load_pretrained:  # 加载预训练模型
            model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)  # 加载参数,跳过错误
            print('Load weights {}.'.format(weights_path))
            if freeze_body in [1, 2]:
                # Freeze darknet53 body or freeze all but 3 output layers.
                num = (185, len(model_body.layers) - 3)[freeze_body - 1]
                for i in range(num):
                    model_body.layers[i].trainable = False  # 将其他层的训练关闭
                print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))
        # 构建 yolo_loss
        # model_body: [(?, 13, 13, 18), (?, 26, 26, 18), (?, 52, 52, 18)]
        # y_true: [(?, 13, 13, 18), (?, 26, 26, 18), (?, 52, 52, 18)]
        model_loss = Lambda(yolo_loss,
                            output_shape=(1,), name='yolo_loss',
                            arguments={'anchors': anchors,
                                       'num_classes': num_classes,
                                       'ignore_thresh': 0.5}
                            )(model_body.output + y_true)
        model = Model(inputs=[model_body.input] + y_true, outputs=model_loss)  # 模型,inputs和outputs
        plot_model(model, to_file=os.path.join('model_data', 'model.png'), show_shapes=True, show_layer_names=True)
        model.summary()
        #
        return model
    
    
    def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
        '''data generator for fit_generator
        annotation_lines: 所有的图片名称
        batch_size:每批图片的大小
        input_shape: 图片的输入尺寸
        anchors: 大小
        num_classes: 类别数
        '''
    
        n = len(annotation_lines)
        i = 0
        while True:
            image_data = []
            box_data = []
            for b in range(batch_size):
                if i == 0:
                    # 随机排列图片顺序
                    np.random.shuffle(annotation_lines)
                # image_data: (16, 416, 416, 3)
                # box_data: (16, 20, 5) # 每个图片最多含有20个框
                image, box = get_random_data(annotation_lines[i], input_shape, random=True)  # 获取图片和盒子
                #获取真实的数据根据输入的尺寸对原始数据进行缩放处理得到input_shape大小的数据图片,
                # 随机进行图片的翻转,标记数据数据也根据比例改变
                image_data.append(image)  # 添加图片
                box_data.append(box)  # 添加盒子
                i = (i + 1) % n
            image_data = np.array(image_data)
            box_data = np.array(box_data)
            # y_true是3个预测特征的列表
            y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)  # 真值
            # y_true的第0和1位是中心点xy,范围是(0~13/26/52),第2和3位是宽高wh,范围是0~1,
            # 第4位是置信度1或0,第5~n位是类别为1其余为0。
            # [(16, 13, 13, 3, 6), (16, 26, 26, 3, 6), (16, 52, 52, 3, 6)]
            yield [image_data] + y_true, np.zeros(batch_size)
    
    
    def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes):
        """
        用于条件检查
        """
        n = len(annotation_lines)  # 标注图片的行数
        if n == 0 or batch_size <= 0: return None
        return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes)
    
    
    if __name__ == '__main__':
        _main()
    
    

    在train.py中主要就是构建yolov3的训练模型,这里作者使用自定义loss的方式进行模型训练并没有,在loss时输入y_true, y_pred。具体参考
    Keras中自定义目标函数(损失函数)的简单方法

    Keras中自定义复杂的loss函数这两篇博客,主要时loss在loss函数里已经把y_true和y_pred计算完成了,所以之后的y_pred,在数据生成器(data_generator)有如下体现,np.zeros(batch_size)

            yield [image_data] + y_true, np.zeros(batch_size)
    

    🆗,这个train.py有两个train模式,一个是冻结模型,一个是微调模型。冻结那几个层手动调节,1是冻结DarkNet53的层,2是冻结全部,只保留最后3层。

    model.py

    #!/usr/bin/env python
    # -- coding: utf-8 --
    """
    Copyright (c) 2018. All rights reserved.
    Created by C. L. Wang on 2018/7/4
    """
    
    from functools import wraps
    
    import numpy as np
    import tensorflow as tf
    from keras import backend as K
    from keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D
    from keras.layers.advanced_activations import LeakyReLU
    from keras.layers.normalization import BatchNormalization
    from keras.models import Model
    from keras.regularizers import l2
    
    from yolo3.utils import compose
    
    
    @wraps(Conv2D)
    def DarknetConv2D(*args, **kwargs):
        # 普通的卷积网络,带正则化,当步长为2时进行下采样
        """Wrapper to set Darknet parameters for Convolution2D."""
        darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}
        darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides') == (2, 2) else 'same'
        darknet_conv_kwargs.update(kwargs)
        return Conv2D(*args, **darknet_conv_kwargs)
    
    
    def DarknetConv2D_BN_Leaky(*args, **kwargs):
        # 没有偏置,带正则项
        """Darknet Convolution2D followed by BatchNormalization and LeakyReLU."""
        no_bias_kwargs = {'use_bias': False}
        no_bias_kwargs.update(kwargs)
        return compose(
            DarknetConv2D(*args, **no_bias_kwargs),
            BatchNormalization(),
            LeakyReLU(alpha=0.1))
    
    
    def resblock_body(x, num_filters, num_blocks):
        # 使用残差块, 1 + 2 * num_filters 为总的卷积层数
        '''A series of resblocks starting with a downsampling Convolution2D'''
        # Darknet uses left and top padding instead of 'same' mode
        x = ZeroPadding2D(((1, 0), (1, 0)))(x)
        x = DarknetConv2D_BN_Leaky(num_filters, (3, 3), strides=(2, 2))(x)
        for i in range(num_blocks):
            y = compose(
                DarknetConv2D_BN_Leaky(num_filters // 2, (1, 1)),
                DarknetConv2D_BN_Leaky(num_filters, (3, 3)))(x)
            x = Add()([x, y])
        return x
    
    
    def darknet_body(x):
        # darknet的主体网络52层卷积网络
        '''Darknent body having 52 Convolution2D layers'''
        x = DarknetConv2D_BN_Leaky(32, (3, 3))(x)
        x = resblock_body(x, num_filters=64, num_blocks=1)
        x = resblock_body(x, num_filters=128, num_blocks=2)
        x = resblock_body(x, num_filters=256, num_blocks=8)
        x = resblock_body(x, num_filters=512, num_blocks=8)
        x = resblock_body(x, num_filters=1024, num_blocks=4)
        return x
    
    
    def make_last_layers(x, num_filters, out_filters):
        # 最后检测头部,无降采采样操作
        '''6 Conv2D_BN_Leaky layers followed by a Conv2D_linear layer'''
        x = compose(
            DarknetConv2D_BN_Leaky(num_filters, (1, 1)),
            DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)),
            DarknetConv2D_BN_Leaky(num_filters, (1, 1)),
            DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)),
            DarknetConv2D_BN_Leaky(num_filters, (1, 1)))(x)
        y = compose(
            DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)),
            DarknetConv2D(out_filters, (1, 1)))(x)
        return x, y
    
    
    def yolo_body(inputs, num_anchors, num_classes):
        """Create YOLO_V3 model CNN body in Keras."""
        darknet = Model(inputs, darknet_body(inputs))
        x, y1 = make_last_layers(darknet.output, 512, num_anchors * (num_classes + 5))
        # 上采样
        x = compose(
            DarknetConv2D_BN_Leaky(256, (1, 1)),
            UpSampling2D(2))(x)
        x = Concatenate()([x, darknet.layers[152].output])
        x, y2 = make_last_layers(x, 256, num_anchors * (num_classes + 5))
        # 上采样
        x = compose(
            DarknetConv2D_BN_Leaky(128, (1, 1)),
            UpSampling2D(2))(x)
        x = Concatenate()([x, darknet.layers[92].output])
        _, y3 = make_last_layers(x, 128, num_anchors * (num_classes + 5))
        # 上采样 y1, y2, y3
        # 13x13, 26x26, 52x52
        return Model(inputs, [y1, y2, y3])
    
    
    def tiny_yolo_body(inputs, num_anchors, num_classes):
        '''Create Tiny YOLO_v3 model CNN body in keras.'''
        x1 = compose(
            DarknetConv2D_BN_Leaky(16, (3, 3)),
            MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
            DarknetConv2D_BN_Leaky(32, (3, 3)),
            MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
            DarknetConv2D_BN_Leaky(64, (3, 3)),
            MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
            DarknetConv2D_BN_Leaky(128, (3, 3)),
            MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
            DarknetConv2D_BN_Leaky(256, (3, 3)))(inputs)
        x2 = compose(
            MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
            DarknetConv2D_BN_Leaky(512, (3, 3)),
            MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='same'),
            DarknetConv2D_BN_Leaky(1024, (3, 3)),
            DarknetConv2D_BN_Leaky(256, (1, 1)))(x1)
        y1 = compose(
            DarknetConv2D_BN_Leaky(512, (3, 3)),
            DarknetConv2D(num_anchors * (num_classes + 5), (1, 1)))(x2)
    
        x2 = compose(
            DarknetConv2D_BN_Leaky(128, (1, 1)),
            UpSampling2D(2))(x2)
        y2 = compose(
            Concatenate(),
            DarknetConv2D_BN_Leaky(256, (3, 3)),
            DarknetConv2D(num_anchors * (num_classes + 5), (1, 1)))([x2, x1])
    
        return Model(inputs, [y1, y2])
    
    
    def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
        """
        feats的后处理函数,feats就是yolo_outputs,把输出转换到inputs的坐标系
        """
        num_anchors = len(anchors)
        # Reshape to batch, height, width, num_anchors, box_params.
        anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])
    
        grid_shape = K.shape(feats)[1:3]  # height, width
        grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
                        [1, grid_shape[1], 1, 1])
        grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
                        [grid_shape[0], 1, 1, 1])
        grid = K.concatenate([grid_x, grid_y])
        grid = K.cast(grid, K.dtype(feats))
        # Reshape to batch, height, width, num_anchors, box_params.
        feats = K.reshape(
            feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])
    
        # Adjust preditions to each spatial grid point and anchor size.
        # 把结果转化到每个格子中,坐标为图片的坐标系的0-1,类别预测也为0-1
        box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))  # xy是归一化的值
        box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))  # wh是归一化的值
        box_confidence = K.sigmoid(feats[..., 4:5])
        box_class_probs = K.sigmoid(feats[..., 5:])
    
        if calc_loss == True:
            return grid, feats, box_xy, box_wh
        return box_xy, box_wh, box_confidence, box_class_probs
    
    
    def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):
        '''Get corrected boxes'''
        # 把预测的图片转换到原始图片的大小
        box_yx = box_xy[..., ::-1]
        box_hw = box_wh[..., ::-1]
        input_shape = K.cast(input_shape, K.dtype(box_yx))
        image_shape = K.cast(image_shape, K.dtype(box_yx))
        new_shape = K.round(image_shape * K.min(input_shape / image_shape))
        offset = (input_shape - new_shape) / 2. / input_shape
        scale = input_shape / new_shape
        box_yx = (box_yx - offset) * scale
        box_hw *= scale
    
        box_mins = box_yx - (box_hw / 2.)
        box_maxes = box_yx + (box_hw / 2.)
        boxes = K.concatenate([
            box_mins[..., 0:1],  # y_min
            box_mins[..., 1:2],  # x_min
            box_maxes[..., 0:1],  # y_max
            box_maxes[..., 1:2]  # x_max
        ])
    
        # Scale boxes back to original image shape.
        boxes *= K.concatenate([image_shape, image_shape])
        return boxes
    
    
    def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
        '''Process Conv layer output'''
        # 对yolo的输出进行后处理,输出合适原始图片的box和scores
        box_xy, box_wh, box_confidence, box_class_probs = yolo_head(
            feats, anchors, num_classes, input_shape)
        boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
        boxes = K.reshape(boxes, [-1, 4])
        box_scores = box_confidence * box_class_probs
        box_scores = K.reshape(box_scores, [-1, num_classes])
        return boxes, box_scores
    
    
    def yolo_eval(yolo_outputs, anchors, num_classes, image_shape,
                  max_boxes=20, score_threshold=.6, iou_threshold=.5):
        """Evaluate YOLO model on given input and return filtered boxes."""
        # 使用yolo模型进行图片的检测,进行坐标转化,和nms处理,和
    
        num_layers = len(yolo_outputs)
        anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]  # default setting
        input_shape = K.shape(yolo_outputs[0])[1:3] * 32
        boxes = []
        box_scores = []
        for l in range(num_layers):
            _boxes, _box_scores = yolo_boxes_and_scores(
                yolo_outputs[l], anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
            boxes.append(_boxes)
            box_scores.append(_box_scores)
        boxes = K.concatenate(boxes, axis=0)
        box_scores = K.concatenate(box_scores, axis=0)
    
        mask = box_scores >= score_threshold
        max_boxes_tensor = K.constant(max_boxes, dtype='int32')
        boxes_ = []
        scores_ = []
        classes_ = []
        for c in range(num_classes):
            # 进行nms处理
            # TODO: use keras backend instead of tf.
            class_boxes = tf.boolean_mask(boxes, mask[:, c])
            class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
            nms_index = tf.image.non_max_suppression(
                class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)
            class_boxes = K.gather(class_boxes, nms_index)
            class_box_scores = K.gather(class_box_scores, nms_index)
            classes = K.ones_like(class_box_scores, 'int32') * c
            boxes_.append(class_boxes)
            scores_.append(class_box_scores)
            classes_.append(classes)
        boxes_ = K.concatenate(boxes_, axis=0)
        scores_ = K.concatenate(scores_, axis=0)
        classes_ = K.concatenate(classes_, axis=0)
    
        return boxes_, scores_, classes_
    
    
    def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
        '''Preprocess true boxes to training input format
        y_true的第0和1位是中心点xy,范围是(0~13/26/52),第2和3位是宽高wh,范围是0~1,
        第4位是置信度1或0,第5~n位是类别为1其余为0。
    
    
        Parameters
        ----------
        true_boxes: array, shape=(m, T, 5)
            Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.
        input_shape: array-like, hw, multiples of 32
        anchors: array, shape=(N, 2), wh
        num_classes: integer
    
        Returns
        -------
        y_true: list of array, shape like yolo_outputs, xywh are reletive value
    
        '''
        assert (true_boxes[..., 4] < num_classes).all(), 'class id must be less than num_classes'
        num_layers = len(anchors) // 3  # default setting
        anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]
        # 所有的真实标注
        true_boxes = np.array(true_boxes, dtype='float32')
        # 模型的输入尺寸
        input_shape = np.array(input_shape, dtype='int32')
        # 获取目标输入x,y
        boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
        # 获取目标的w,h
        boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]
        # 这时候坐标变为0~1
        true_boxes[..., 0:2] = boxes_xy / input_shape[::-1]
        true_boxes[..., 2:4] = boxes_wh / input_shape[::-1]
        # 获取所有的这是标记的数量
        m = true_boxes.shape[0]
        # 得到模型每一个输出y1-y3的下采样之后的特征图片大小
        grid_shapes = [input_shape // {0: 32, 1: 16, 2: 8}[l] for l in range(num_layers)]
        # 获取三组y_true
        # [(16, 13, 13, 3, 6), (16, 26, 26, 3, 6), (16, 52, 52, 3, 6)],
        y_true = [np.zeros((m, grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5 + num_classes),
                           dtype='float32') for l in range(num_layers)]
    
        # Expand dim to apply broadcasting.
        # 9个anchors的值
        anchors = np.expand_dims(anchors, 0)
        anchor_maxes = anchors / 2.
        anchor_mins = -anchor_maxes
        valid_mask = boxes_wh[..., 0] > 0
    
        for b in range(m):
            # Discard zero rows.
            # 只第b个boxes选取wh大于0的anchors
            wh = boxes_wh[b, valid_mask[b]]
            if len(wh) == 0: continue
            # Expand dim to apply broadcasting.
            wh = np.expand_dims(wh, -2)
            box_maxes = wh / 2.
            box_mins = -box_maxes
            # 求目标的范围,和anchors的iou值,查看目标的标记值与9个anchors哪个iou最大
            intersect_mins = np.maximum(box_mins, anchor_mins)
            intersect_maxes = np.minimum(box_maxes, anchor_maxes)
            intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
            intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
            box_area = wh[..., 0] * wh[..., 1]
            anchor_area = anchors[..., 0] * anchors[..., 1]
            iou = intersect_area / (box_area + anchor_area - intersect_area)
            # 从每个iou值中,找到iou值最大的目标
            # Find best anchor for each true box
            # 得到9个anchos的一个值
            best_anchor = np.argmax(iou, axis=-1)
    
            for t, n in enumerate(best_anchor):
                # t,n为9个最大之中的某一个
                for l in range(num_layers):
                    # 三层y1
                    if n in anchor_mask[l]:
                        # i就是在对应的特征图上的实际尺寸的宽,就是高
                        i = np.floor(true_boxes[b, t, 0] * grid_shapes[l][1]).astype('int32')
                        j = np.floor(true_boxes[b, t, 1] * grid_shapes[l][0]).astype('int32')
                        k = anchor_mask[l].index(n)
                        c = true_boxes[b, t, 4].astype('int32')
                        # y_true的第0和1位是中心点xy,范围是(0~13/26/52),
                        # 第2和3位是宽高wh,范围是0~1,
                        # 第4位是置信度1或0,
                        # 第5~n位是类别为1其余为0。
                        y_true[l][b, j, i, k, 0:4] = true_boxes[b, t, 0:4]
                        y_true[l][b, j, i, k, 4] = 1
                        y_true[l][b, j, i, k, 5 + c] = 1
        # [(16, 13, 13, 3, 6), (16, 26, 26, 3, 6), (16, 52, 52, 3, 6)]
        return y_true
    
    
    def box_iou(b1, b2):
        '''Return iou tensor
    
        Parameters
        ----------
        b1: tensor, shape=(i1,...,iN, 4), xywh
        b2: tensor, shape=(j, 4), xywh
    
        Returns
        -------
        iou: tensor, shape=(i1,...,iN, j)
    
        '''
    
        # Expand dim to apply broadcasting.
        b1 = K.expand_dims(b1, -2)
        b1_xy = b1[..., :2]
        b1_wh = b1[..., 2:4]
        b1_wh_half = b1_wh / 2.
        b1_mins = b1_xy - b1_wh_half
        b1_maxes = b1_xy + b1_wh_half
    
        # Expand dim to apply broadcasting.
        b2 = K.expand_dims(b2, 0)
        b2_xy = b2[..., :2]
        b2_wh = b2[..., 2:4]
        b2_wh_half = b2_wh / 2.
        b2_mins = b2_xy - b2_wh_half
        b2_maxes = b2_xy + b2_wh_half
    
        intersect_mins = K.maximum(b1_mins, b2_mins)
        intersect_maxes = K.minimum(b1_maxes, b2_maxes)
        intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
        intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
        b1_area = b1_wh[..., 0] * b1_wh[..., 1]
        b2_area = b2_wh[..., 0] * b2_wh[..., 1]
        iou = intersect_area / (b1_area + b2_area - intersect_area)
    
        return iou
    
    
    def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=True):
        '''Return yolo_loss tensor
    
        Parameters
        ----------
        yolo_outputs: list of tensor, the output of yolo_body or tiny_yolo_body
        y_true: list of array, the output of preprocess_true_boxes
        anchors: array, shape=(N, 2), wh
        num_classes: integer
        ignore_thresh: float, the iou threshold whether to ignore object confidence loss
    
        Returns
        -------
        loss: tensor, shape=(1,)
    
        '''
        num_layers = len(anchors) // 3  # default setting
        # 获取模型的输出,获取输入真值
        yolo_outputs = args[:num_layers]
        y_true = args[num_layers:]
        anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]
        # input_shape是输出的尺寸*32, 就是原始的输入尺寸,[1:3]是尺寸的位置
        input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))
        # 每个网格的尺寸,
        grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)]
        loss = 0
        m = K.shape(yolo_outputs[0])[0]  # batch size, tensor
        mf = K.cast(m, K.dtype(yolo_outputs[0]))
        # y_true的第0和1位是中心点xy,范围是(0~13/26/52),
        # 第2和3位是宽高wh,范围是0~1,
        # 第4位是置信度1或0,
        # 第5~n位是类别为1其余为0。
        for l in range(num_layers):
            object_mask = y_true[l][..., 4:5] # 1
            true_class_probs = y_true[l][..., 5:]
            # 这是yolo_outputs的后处理程序
            grid, raw_pred, pred_xy, pred_wh = \
                yolo_head(yolo_outputs[l], anchors[anchor_mask[l]],
                          num_classes, input_shape, calc_loss=True)
            pred_box = K.concatenate([pred_xy, pred_wh])
    
            # Darknet raw box to calculate loss.
            # bugfix grid_shapes重复相乘,另一个在preprocess_true_boxes中
            # 把真实的坐标转换到预测坐标系
            raw_true_xy = y_true[l][..., :2] * grid_shapes[l][::-1] - grid
            raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])  # 1
            raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh))  # avoid log(0)=-inf
            box_loss_scale = 2 - y_true[l][..., 2:3] * y_true[l][..., 3:4]  # 2-w*h
    
            # Find ignore mask, iterate over each of batch.
            ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
            object_mask_bool = K.cast(object_mask, 'bool')
    
            def loop_body(b, ignore_mask):
                true_box = tf.boolean_mask(y_true[l][b, ..., 0:4], object_mask_bool[b, ..., 0])
                iou = box_iou(pred_box[b], true_box)
                best_iou = K.max(iou, axis=-1)
                ignore_mask = ignore_mask.write(b, K.cast(best_iou < ignore_thresh, K.dtype(true_box)))
                return b + 1, ignore_mask
    
            _, ignore_mask = K.control_flow_ops.while_loop(lambda b, *args: b < m, loop_body, [0, ignore_mask])
            ignore_mask = ignore_mask.stack()
            ignore_mask = K.expand_dims(ignore_mask, -1)
    
            # K.binary_crossentropy is helpful to avoid exp overflow.
            xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[..., 0:2],
                                                                           from_logits=True)
            wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh - raw_pred[..., 2:4])
            confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) + \
                              (1 - object_mask) * K.binary_crossentropy(object_mask, raw_pred[..., 4:5],
                                                                        from_logits=True) * ignore_mask
            class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[..., 5:], from_logits=True)
    
            xy_loss = K.sum(xy_loss) / mf
            wh_loss = K.sum(wh_loss) / mf
            confidence_loss = K.sum(confidence_loss) / mf
            class_loss = K.sum(class_loss) / mf
            loss += xy_loss + wh_loss + confidence_loss + class_loss
            if print_loss:
                loss = tf.Print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)],
                                message='loss: ')
        return loss
    
    

    在model.py中有几个函数需要进行讲解:

    1. DarknetConv2D(*args, **kwargs)普通的卷积网络,带正则化,当步长为2时进行下采
    2. DarknetConv2D_BN_Leaky(*args, **kwargs)没有偏置,带正则项
    3. resblock_body(x, num_filters, num_blocks)使用残差块, 1 + 2 * num_filters 为总的卷积层数
    4. darknet_body(x) darknet的主体网络52层卷积网络
    5. make_last_layers(x, num_filters, out_filters) yolo最后检测头部,无降采采样操作
    6. yolo_body(inputs, num_anchors, num_classes)yolov3的三个检测输出部分
    7. yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False) feats的后处理函数,feats就是yolo_outputs,把输出转换到inputs的坐标系
    8. yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape) 把预测的图片转换到原始图片的大小
    9. yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape) 对yolo的输出进行后处理,输出合适原始图片的box和scores
    10. yolo_eval(yolo_outputs, anchors, num_classes, image_shape,
      max_boxes=20, score_threshold=.6, iou_threshold=.5) 使用yolo模型进行图片的检测,进行坐标转化,和nms处理
    11. preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes) 对图片中标记的数据与anchors进行转换,转换到预测的坐标系
    12. yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=True) 构建loss

    utils.py

    #!/usr/bin/env python
    # -- coding: utf-8 --
    
    """Miscellaneous utility functions."""
    
    from functools import reduce
    
    from PIL import Image
    import numpy as np
    from matplotlib.colors import rgb_to_hsv, hsv_to_rgb
    
    
    def compose(*funcs):
        """
        https://blog.csdn.net/jmu201521121021/article/details/86626976
        # 参数为多个函数名,按照reduce的功能执行,把前一个函数的结果作为下一个函数的输入,知道最后执行完毕
        Compose arbitrarily many functions, evaluated left to right.
    
        Reference: https://mathieularose.com/function-composition-in-python/
        """
        # return lambda x: reduce(lambda v, f: f(v), funcs, x)
        if funcs:
            return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
        else:
            raise ValueError('Composition of empty sequence not supported.')
    
    
    def letterbox_image(image, size):
        '''resize image with unchanged aspect ratio using padding'''
        iw, ih = image.size  # 原始图像是1200x1800
        w, h = size  # 转换为416x416
        scale = min(float(w) / float(iw), float(h) / float(ih))  # 转换比例
        nw = int(iw * scale)  # 新图像的宽,保证新图像是等比下降的
        nh = int(ih * scale)  # 新图像的高
    
        image = image.resize((nw, nh), Image.BICUBIC)  # 缩小图像
        new_image = Image.new('RGB', size, (128, 128, 128))  # 生成灰色图像
        new_image.paste(image, ((w - nw) // 2, (h - nh) // 2))  # 将图像填充为中间图像,两侧为灰色的样式
        return new_image
    
    
    def rand(a=0., b=1.):
        return np.random.rand() * (b - a) + a
    
    
    def get_random_data(
            annotation_line, input_shape, random=True,
            max_boxes=20, jitter=.3, hue=.1, sat=1.5,
            val=1.5, proc_img=True):
        '''random preprocessing for real-time data augmentation:
        获取真实的数据根据输入的尺寸对原始数据进行缩放处理得到input_shape大小的数据图片,
        随机进行图片的翻转,标记数据数据也根据比例改变
            annotation_line: 单条图片的信息的列表
            input_shape:输入的尺寸
        '''
        # 处理图片
        line = annotation_line.split()
        # 读取图片图片
        image = Image.open(line[0])
        # 原始图片的比例
        iw, ih = image.size
        # 获取模型的输入图片的大小
        h, w = input_shape
        box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])
    
        if not random:
            # resize image
            # 获取原始图片和模型输入图片的比例
            scale = min(float(w) / float(iw), float(h) / float(ih))
            nw = int(iw * scale)
            nh = int(ih * scale)
            dx = (w - nw) // 2
            dy = (h - nh) // 2
            image_data = 0
            if proc_img:
                image = image.resize((nw, nh), Image.BICUBIC)
                # 首先创建一张灰色的图片
                new_image = Image.new('RGB', (w, h), (128, 128, 128))
                # 把原始的图片粘贴到灰色图片上
                new_image.paste(image, (dx, dy))
                image_data = np.array(new_image) / 255.
    
            # correct boxes
            box_data = np.zeros((max_boxes, 5))
            # 对所有的图片中的目标进行缩放
            if len(box) > 0:
                np.random.shuffle(box)
                if len(box) > max_boxes: box = box[:max_boxes]  # 最多只取20个
                box[:, [0, 2]] = box[:, [0, 2]] * scale + dx
                box[:, [1, 3]] = box[:, [1, 3]] * scale + dy
                box_data[:len(box)] = box
    
            return image_data, box_data
    
        # resize image
        # 随机的图片比例变换
        new_ar = w / h * rand(1 - jitter, 1 + jitter) / rand(1 - jitter, 1 + jitter)
        scale = rand(.25, 2.)
        # 计算新的图片尺寸
        if new_ar < 1:
            nh = int(scale * h)
            nw = int(nh * new_ar)
        else:
            nw = int(scale * w)
            nh = int(nw / new_ar)
        # 改变图片尺寸
        image = image.resize((nw, nh), Image.BICUBIC)
    
        # place image
        # 随机把图片摆放在灰度图片上
        dx = int(rand(0, w - nw))
        dy = int(rand(0, h - nh))
        new_image = Image.new('RGB', (w, h), (128, 128, 128))
        new_image.paste(image, (dx, dy))
        image = new_image
    
        # flip image or not
        # 是否反转图片
        flip = rand() < .5
        if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
    
        # distort image
        # 在HSV坐标域中,改变图片的颜色范围,hue值相加,sat和vat相乘,
        # 先由RGB转为HSV,再由HSV转为RGB,添加若干错误判断,避免范围过大。
        hue = rand(-hue, hue)
        sat = rand(1, sat) if rand() < .5 else 1 / rand(1, sat)
        val = rand(1, val) if rand() < .5 else 1 / rand(1, val)
        x = rgb_to_hsv(np.array(image) / 255.)
        x[..., 0] += hue
        x[..., 0][x[..., 0] > 1] -= 1
        x[..., 0][x[..., 0] < 0] += 1
        x[..., 1] *= sat
        x[..., 2] *= val
        x[x > 1] = 1
        x[x < 0] = 0
        image_data = hsv_to_rgb(x)  # numpy array, 0 to 1
    
        # correct boxes
        # 将所有的图片变换,增加至检测框中,并且包含若干异常处理,
        # 避免变换之后的值过大或过小,去除异常的box。
        box_data = np.zeros((max_boxes, 5))
        if len(box) > 0:
            np.random.shuffle(box)
            # 变换所有目标的尺寸
            box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dx
            box[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dy
            # 如果已经翻转了需要进行坐标变换,并且把坐标限制在图片内
            if flip: box[:, [0, 2]] = w - box[:, [2, 0]]
            box[:, 0:2][box[:, 0:2] < 0] = 0
            box[:, 2][box[:, 2] > w] = w
            box[:, 3][box[:, 3] > h] = h
            box_w = box[:, 2] - box[:, 0]
            box_h = box[:, 3] - box[:, 1]
            box = box[np.logical_and(box_w > 1, box_h > 1)]  # discard invalid box
            # 最大的目标数不能超过超参数
            if len(box) > max_boxes: box = box[:max_boxes]
            box_data[:len(box)] = box
    
        return image_data, box_data
    
    

    这里的get_random_data就是对原始数据进行处理的函数,获取真实的数据根据输入的尺寸对原始数据进行缩放处理得到input_shape大小的数据图片,
    随机进行图片的翻转,标记数据数据也根据比例改变

    yolo.py

    #!/usr/bin/env python
    # -- coding: utf-8 --
    """
    Copyright (c) 2018. All rights reserved.
    Created by C. L. Wang on 2018/7/4
    """
    
    """
    Run a YOLO_v3 style detection model on test images.
    """
    
    import colorsys
    import os
    from timeit import default_timer as timer
    
    import numpy as np
    from PIL import Image, ImageFont, ImageDraw
    from keras import backend as K
    from keras.layers import Input
    from yolo3.model import yolo_eval, yolo_body
    from yolo3.utils import letterbox_image
    
    
    class YOLO(object):
        def __init__(self):
            self.anchors_path = 'configs/yolo_anchors.txt'  # Anchors
            self.model_path = 'model_data/yolo_weights.h5'  # 模型文件
            self.classes_path = 'configs/coco_classes_ch.txt'  # 类别文件
    
            # self.model_path = 'model_data/ep074-loss26.535-val_loss27.370.h5'  # 模型文件
            # self.classes_path = 'configs/wider_classes.txt'  # 类别文件
    
            self.score = 0.60
            self.iou = 0.45
            # self.iou = 0.01
            self.class_names = self._get_class()  # 获取类别
            self.anchors = self._get_anchors()  # 获取anchor
            self.sess = K.get_session()
            self.model_image_size = (416, 416)  # fixed size or (None, None), hw
    
            self.colors = self.__get_colors(self.class_names)
            self.boxes, self.scores, self.classes = self.generate()
    
        def _get_class(self):
            # 获取检测类别
            classes_path = os.path.expanduser(self.classes_path)
            with open(classes_path, encoding='utf8') as f:
                class_names = f.readlines()
            class_names = [c.strip() for c in class_names]
            return class_names
    
        def _get_anchors(self):
            # 获取检测的anchors
            anchors_path = os.path.expanduser(self.anchors_path)
            with open(anchors_path) as f:
                anchors = f.readline()
            anchors = [float(x) for x in anchors.split(',')]
            return np.array(anchors).reshape(-1, 2)
    
        @staticmethod
        def __get_colors(names):
            # 不同的框,不同的颜色
            hsv_tuples = [(float(x) / len(names), 1., 1.)
                          for x in range(len(names))]  # 不同颜色
            colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
            colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))  # RGB
            np.random.seed(10101)
            np.random.shuffle(colors)
            np.random.seed(None)
    
            return colors
    
        def generate(self):#
            # 构建检测模型,下载模型数据
            model_path = os.path.expanduser(self.model_path)  # 转换~
            assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.'
    
            num_anchors = len(self.anchors)  # anchors的数量
            num_classes = len(self.class_names)  # 类别数
    
            self.yolo_model = yolo_body(Input(shape=(416, 416, 3)), 3, num_classes)
            self.yolo_model.load_weights(model_path)  # 加载模型参数
    
            print('{} model, {} anchors, and {} classes loaded.'.format(model_path, num_anchors, num_classes))
    
            # 根据检测参数,过滤框
            self.input_image_shape = K.placeholder(shape=(2,))
            boxes, scores, classes = yolo_eval(
                self.yolo_model.output, self.anchors, len(self.class_names),
                self.input_image_shape, score_threshold=self.score, iou_threshold=self.iou)
            return boxes, scores, classes
    
        def detect_image(self, image):
            # 检测图片,返回图片
            start = timer()  # 起始时间
    
            if self.model_image_size != (None, None):  # 416x416, 416=32*13,必须为32的倍数,最小尺度是除以32
                assert self.model_image_size[0] % 32 == 0, 'Multiples of 32 required'
                assert self.model_image_size[1] % 32 == 0, 'Multiples of 32 required'
                boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))  # 填充图像
            else:
                new_image_size = (image.width - (image.width % 32), image.height - (image.height % 32))
                boxed_image = letterbox_image(image, new_image_size)
            image_data = np.array(boxed_image, dtype='float32')
            print('detector size {}'.format(image_data.shape))
            image_data /= 255.  # 转换0~1
            image_data = np.expand_dims(image_data, 0)  # 添加批次维度,将图片增加1维
    
            # 参数盒子、得分、类别;输入图像0~1,4维;原始图像的尺寸
            # 通过调用yolo_eval: self.boxes, self.scores, self.classes
            out_boxes, out_scores, out_classes = self.sess.run(
                [self.boxes, self.scores, self.classes],
                feed_dict={
                    self.yolo_model.input: image_data,
                    self.input_image_shape: [image.size[1], image.size[0]],
                    K.learning_phase(): 0
                })
    
            print('Found {} boxes for {}'.format(len(out_boxes), 'img'))  # 检测出的框
    
            font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
                                      size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))  # 字体
            thickness = (image.size[0] + image.size[1]) // 512  # 厚度
            for i, c in reversed(list(enumerate(out_classes))):
                predicted_class = self.class_names[c]  # 类别
                box = out_boxes[i]  # 框
                score = out_scores[i]  # 执行度
    
                label = '{} {:.2f}'.format(predicted_class, score)  # 标签
                draw = ImageDraw.Draw(image)  # 画图
                label_size = draw.textsize(label, font)  # 标签文字
    
                top, left, bottom, right = box
                top = max(0, np.floor(top + 0.5).astype('int32'))
                left = max(0, np.floor(left + 0.5).astype('int32'))
                bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
                right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
                print(label, (left, top), (right, bottom))  # 边框
    
                if top - label_size[1] >= 0:  # 标签文字
                    text_origin = np.array([left, top - label_size[1]])
                else:
                    text_origin = np.array([left, top + 1])
    
                # My kingdom for a good redistributable image drawing library.
                for i in range(thickness):  # 画框
                    draw.rectangle(
                        [left + i, top + i, right - i, bottom - i],
                        outline=self.colors[c])
                draw.rectangle(  # 文字背景
                    [tuple(text_origin), tuple(text_origin + label_size)],
                    fill=self.colors[c])
                draw.text(text_origin, label, fill=(0, 0, 0), font=font)  # 文案
                del draw
    
            end = timer()
            print(end - start)  # 检测执行时间
            return image
    
        def detect_objects_of_image(self, img_path):
            # 检测图片返回,box的各个值
            image = Image.open(img_path)
            assert self.model_image_size[0] % 32 == 0, 'Multiples of 32 required'
            assert self.model_image_size[1] % 32 == 0, 'Multiples of 32 required'
            boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))  # 填充图像
    
            image_data = np.array(boxed_image, dtype='float32')
            image_data /= 255.  # 转换0~1
            image_data = np.expand_dims(image_data, 0)  # 添加批次维度,将图片增加1维
            # print('detector size {}'.format(image_data.shape))
    
            out_boxes, out_scores, out_classes = self.sess.run(
                [self.boxes, self.scores, self.classes],
                feed_dict={
                    self.yolo_model.input: image_data,
                    self.input_image_shape: [image.size[1], image.size[0]],
                    K.learning_phase(): 0
                })
    
            # print('out_boxes: {}'.format(out_boxes))
            # print('out_scores: {}'.format(out_scores))
            # print('out_classes: {}'.format(out_classes))
    
            img_size = image.size[0] * image.size[1]
            # 过滤较小的图片
            objects_line = self._filter_boxes(out_boxes, out_scores, out_classes, img_size)
            return objects_line
    
        def _filter_boxes(self, boxes, scores, classes, img_size):
            # 过滤较小的图片
            res_items = []
            for box, score, clazz in zip(boxes, scores, classes):
                top, left, bottom, right = box
                box_size = (bottom - top) * (right - left)
                rate = float(box_size) / float(img_size)
                clz_name = self.class_names[clazz]
                if rate > 0.05:
                    res_items.append('{}-{:0.2f}'.format(clz_name, rate))
            res_line = ','.join(res_items)
            return res_line
    
        def close_session(self):
            self.sess.close()
    
    
    def detect_img_for_test():
        yolo = YOLO()
        img_path = './dataset/vDaPl5QHdoqb2wOaVql4FoJWNGglYk.jpg'
        image = Image.open(img_path)
        r_image = yolo.detect_image(image)
        yolo.close_session()
        r_image.save('xxx.png')
    
    
    def test_of_detect_objects_of_image():
        yolo = YOLO()
        img_path = './dataset/vDaPl5QHdoqb2wOaVql4FoJWNGglYk.jpg'
        objects_line = yolo.detect_objects_of_image(img_path)
        print(objects_line)
    
    
    if __name__ == '__main__':
        # detect_img_for_test()
        test_of_detect_objects_of_image()
    
    

    这是yolo的检测文件,yolo检测模型对外提供两个接口:

    1. detect_image(self, image) 检测图片,返回图片
    2. detect_objects_of_image(self, img_path) 检测图片返回,box的各个值

    这里的yolo模型使用的就是model.py文件yolo_eval()函数,这是在模型训练完成之后,最重要的函数,这个文件只导入了yolo_eval和 yolo_body两个函数

    from yolo3.model import yolo_eval, yolo_body
    

    参考:

    https://github.com/SpikeKing/keras-yolo3-detection

    Keras中自定义目标函数(损失函数)的简单方法

    Keras中自定义复杂的loss函数

    相关文章

      网友评论

          本文标题:keras-Yolov3 源码调试

          本文链接:https://www.haomeiwen.com/subject/jyzbsqtx.html