yoloV3

作者: 永远学习中 | 来源:发表于2018-12-24 17:50 被阅读0次

    github代码地址:https://github.com/BobLiu20/YOLOv3_PyTorch
    参考博文:yolo系列之yolo v3
    主要参考这个博客以及代码进行学习。


    YOLO_V1和YOLO_V2不进行描述,请看上面的博文中的链接。
    网络结构的思想相同,核心有:backbone,BN,leaky RELU,Logistic regression,多尺度,端到端,“分而治之”。
    Faster R-CNN系列先通过region-proposal进行候选区域提取,再对候选区域进行调整。
    YOLO系列假设每个区域块都可能有东西,对每个块进行物品检测,类似于Faster R-CNN的后端网络。

    yolo_v3网络结构

    图一.Yolo_v3结构图.png
    上图是博文中大佬的心血,如要引用请获得该作者同意。
    大佬推荐的模型可视化工具:Netron
    网络模块说明
    DBL:如图一左下角所示,DBL就是conv + BN + leaky RELU。
    resn:n代表数字,有res1,...,res8等,表示res_block中有多少个res_unit.该模块的具体结构会在后文介绍。res的结构数量是1,2,8,8,4。
    concat:张量拼接,将两个矩阵进行拼接,就是矩阵的连接,仅改变channel,不改变batch、weight、height。

    1.barknet53

    图二 barknet53结构图.png

    v3中没有池化层和全连接层,tensor的改变依靠卷积核的步长。步长2表示宽高各虽小为原来的一半。darknet中有五次缩小,最终是将原图缩小为输入的1/32。

    2.output

    yolov3输出三个不同尺度的feature map,如图三y1,y2,y3所示。这就是论文中的predictions across scales

    图三 图像输出
    y1,y2,y3的channel都是255,对于coco集合,有80个类别。每一个box对应一个概率,每个单元格预测三个box,每个box需要(x, y, w, h, confidence)五个基本参数,所以有3×(5 + 80) = 255个参数。Yolov3采用相对位置预测,预测出b-box中心点相对于该box左上角的相对坐标。confidence表示是前景的概率。
    位置预测公式
    box图例

    3.loss function

    v3中使用的logistic regression对box中的内容进行目标评分,也就是一个目标可以有多个属性,也能使人,也可能是女人。

    接下来到了美丽的代码环节

    1.网络模型

    网络训练文件:training/training.py

    # 按配置加载网络
    net = ModelMain(config, is_training=is_training)
    # 设定进行网络训练
    net.train(is_training)
    

    网络模型文件:nets/model_main.py

    • 1、DBL模块
        # 输入输出特征图大小不变的特征矩阵,仅改变C,步长都是1,所以就是再增加非线性能力的卷积层
        # _in 输入特征图维数
        # _out 输出特征图维数
        # ks  卷积核大小
        def _make_cbl(self, _in, _out, ks):
            ''' cbl = conv + batch_norm + leaky_relu
            '''
            pad = (ks - 1) // 2 if ks else 0
            return nn.Sequential(OrderedDict([
                ("conv", nn.Conv2d(_in, _out, kernel_size=ks, stride=1, padding=pad, bias=False)),
                ("bn", nn.BatchNorm2d(_out)),
                ("relu", nn.LeakyReLU(0.1)),
            ]))
    
    • 2、DBL×5模块
        # 将in_filters维度的矩阵,使用filters_list维度进行改变,输出out_filter维度的矩阵
        def _make_embedding(self, filters_list, in_filters, out_filter):
            m = nn.ModuleList([
                self._make_cbl(in_filters, filters_list[0], 1),
                self._make_cbl(filters_list[0], filters_list[1], 3),
                self._make_cbl(filters_list[1], filters_list[0], 1),
                self._make_cbl(filters_list[0], filters_list[1], 3),
                self._make_cbl(filters_list[1], filters_list[0], 1),
                self._make_cbl(filters_list[0], filters_list[1], 3)])
            m.add_module("conv_out", nn.Conv2d(filters_list[1], out_filter, kernel_size=1,
                                               stride=1, padding=0, bias=True))
            return m
    
    • 3、DBL×5模块
    
        def forward(self, x):
            #  一系列的DBL,out_branch中间爱呢产生的用于与上一层的特征concat
            # 五个DBL进行非线性,一个DBL +conv生成最终结果
            def _branch(_embedding, _in):
                for i, e in enumerate(_embedding):
                    _in = e(_in)
                    if i == 4:
                        out_branch = _in
                return _in, out_branch
            #  backbone
            x2, x1, x0 = self.backbone(x)
            #  一系列的DBL,
            out0, out0_branch = _branch(self.embedding0, x0)
            #  yolo branch 1
            x1_in = self.embedding1_cbl(out0_branch)
            x1_in = self.embedding1_upsample(x1_in)
            x1_in = torch.cat([x1_in, x1], 1)
            out1, out1_branch = _branch(self.embedding1, x1_in)
            #  yolo branch 2
            x2_in = self.embedding2_cbl(out1_branch)
            x2_in = self.embedding2_upsample(x2_in)
            x2_in = torch.cat([x2_in, x2], 1)
            out2, out2_branch = _branch(self.embedding2, x2_in)
            return out0, out1, out2
    
    • 4、网络初始化
           def __init__(self, config, is_training=True):
            super(ModelMain, self).__init__()
            self.config = config
            self.training = is_training
            self.model_params = config["model_params"]
            #  darknet_53模块
            _backbone_fn = backbone_fn[self.model_params["backbone_name"]]
            self.backbone = _backbone_fn(self.model_params["backbone_pretrained"])
            _out_filters = self.backbone.layers_out_filters
            #  对应y1的DBL*5、DBL+conv ,输出:y1和相应分支
            final_out_filter0 = len(config["yolo"]["anchors"][0]) * (5 + config["yolo"]["classes"])
            self.embedding0 = self._make_embedding([512, 1024], _out_filters[-1], final_out_filter0)
            #  对应y2的DBL、UpSample、DBL*5、DBL+conv ,输出:y2和相应分支
            final_out_filter1 = len(config["yolo"]["anchors"][1]) * (5 + config["yolo"]["classes"])
            self.embedding1_cbl = self._make_cbl(512, 256, 1)
            self.embedding1_upsample = nn.Upsample(scale_factor=2, mode='nearest')
            self.embedding1 = self._make_embedding([256, 512], _out_filters[-2] + 256, final_out_filter1)
            #  对应y3的DBL、UpSample、DBL*5、DBL+conv ,输出:y3和相应分支
            final_out_filter2 = len(config["yolo"]["anchors"][2]) * (5 + config["yolo"]["classes"])
            self.embedding2_cbl = self._make_cbl(256, 128, 1)
            self.embedding2_upsample = nn.Upsample(scale_factor=2, mode='nearest')
            self.embedding2 = self._make_embedding([128, 256], _out_filters[-3] + 128, final_out_filter2)
    
    • 4、网络初始化
        def forward(self, x):
            #  一系列的DBL,out_branch中间产生的用于与上一层的特征concat
            # 五个DBL进行非线性,一个DBL +conv生成最终结果
            def _branch(_embedding, _in):
                for i, e in enumerate(_embedding):
                    _in = e(_in)
                    if i == 4:
                        out_branch = _in
                return _in, out_branch
            #  backbone
            # 生成虚线框中的三个输出:x0是最右边的输出,x1是向下右边的输出。,x2向下的左边的。
            x2, x1, x0 = self.backbone(x)
            #  输出y1对应的DBL×5,DBL+conv
            out0, out0_branch = _branch(self.embedding0, x0)
            #  输出y2对应的DBL+upsample,再通过cat进行拼接,最后使用DBL×5,DBL+conv获取y2
            x1_in = self.embedding1_cbl(out0_branch)
            x1_in = self.embedding1_upsample(x1_in)
            x1_in = torch.cat([x1_in, x1], 1)
            out1, out1_branch = _branch(self.embedding1, x1_in)
            #  输出y3对应的DBL+upsample,再通过cat进行拼接,最后使用DBL×5,DBL+conv获取y3
            x2_in = self.embedding2_cbl(out1_branch)
            x2_in = self.embedding2_upsample(x2_in)
            x2_in = torch.cat([x2_in, x2], 1)
            out2, out2_branch = _branch(self.embedding2, x2_in)
            return out0, out1, out2
    
    • 5、获取真实标签
        def get_target(self, target, anchors, in_w, in_h, ignore_threshold):
            bs = target.size(0)
            # 生成每个变量的矩阵都是bs,50,in_h,in_w
            mask = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
            noobj_mask = torch.ones(bs, self.num_anchors, in_h, in_w, requires_grad=False)
            tx = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
            ty = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
            tw = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
            th = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
            tconf = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
            tcls = torch.zeros(bs, self.num_anchors, in_h, in_w, self.num_classes, requires_grad=False)
            for b in range(bs):
                for t in range(target.shape[1]):
                    if target[b, t].sum() == 0:
                        continue
                    # 计算应该图像的那个块中
                    gx = target[b, t, 1] * in_w
                    gy = target[b, t, 2] * in_h
                    gw = target[b, t, 3] * in_w
                    gh = target[b, t, 4] * in_h
                    # 得到对应grid的序号
                    gi = int(gx)
                    gj = int(gy)
                    # 得到真实box的宽高
                    gt_box = torch.FloatTensor(np.array([0, 0, gw, gh])).unsqueeze(0)
                    # 得到anchor的宽高
                    anchor_shapes = torch.FloatTensor(np.concatenate((np.zeros((self.num_anchors, 2)),
                                                                      np.array(anchors)), 1))
                    # 计算真实box与anchor的box的IOU
                    anch_ious = bbox_iou(gt_box, anchor_shapes)
                    # 重叠比例大于阈值则设置为0,忽略。
                    noobj_mask[b, anch_ious > ignore_threshold, gj, gi] = 0
                    # 找到anchor匹配率最佳的
                    best_n = np.argmax(anch_ious)
                    # 给予匹配率最佳的mask为1
                    mask[b, best_n, gj, gi] = 1
                    # box的x,y偏移
                    tx[b, best_n, gj, gi] = gx - gi
                    ty[b, best_n, gj, gi] = gy - gj
                    # box的宽、高偏移
                    tw[b, best_n, gj, gi] = math.log(gw/anchors[best_n][0] + 1e-16)
                    th[b, best_n, gj, gi] = math.log(gh/anchors[best_n][1] + 1e-16)
                    # 设置
                    tconf[b, best_n, gj, gi] = 1
                    # 生成on-hot编码的类型值
                    tcls[b, best_n, gj, gi, int(target[b, t, 0])] = 1
            return mask, noobj_mask, tx, ty, tw, th, tconf, tcls
    
    • 6、loss计算
       def __init__(self, anchors, num_classes, img_size):
            super(YOLOLoss, self).__init__()
            self.anchors = anchors
            self.num_anchors = len(anchors)
            self.num_classes = num_classes
            self.bbox_attrs = 5 + num_classes
            self.img_size = img_size
    
            self.ignore_threshold = 0.5
            self.lambda_xy = 2.5
            self.lambda_wh = 2.5
            self.lambda_conf = 1.0
            self.lambda_cls = 1.0
    
            self.mse_loss = nn.MSELoss()
            self.bce_loss = nn.BCELoss()
    
        def forward(self, input, targets=None):
            bs = input.size(0)
            in_h = input.size(2)
            in_w = input.size(3)
            stride_h = self.img_size[1] / in_h
            stride_w = self.img_size[0] / in_w
            scaled_anchors = [(a_w / stride_w, a_h / stride_h) for a_w, a_h in self.anchors]
    
            prediction = input.view(bs,  self.num_anchors,
                                    self.bbox_attrs, in_h, in_w).permute(0, 1, 3, 4, 2).contiguous()
    
            # Get outputs
            x = torch.sigmoid(prediction[..., 0])          # Center x
            y = torch.sigmoid(prediction[..., 1])          # Center y
            w = prediction[..., 2]                         # Width
            h = prediction[..., 3]                         # Height
            conf = torch.sigmoid(prediction[..., 4])       # Conf
            pred_cls = torch.sigmoid(prediction[..., 5:])  # Cls pred.
    
            if targets is not None:
                #  build target
                mask, noobj_mask, tx, ty, tw, th, tconf, tcls = self.get_target(targets, scaled_anchors,
                                                                               in_w, in_h,
                                                                               self.ignore_threshold)
                mask, noobj_mask = mask.cuda(), noobj_mask.cuda()
                tx, ty, tw, th = tx.cuda(), ty.cuda(), tw.cuda(), th.cuda()
                tconf, tcls = tconf.cuda(), tcls.cuda()
                #  每一种loss.
                loss_x = self.bce_loss(x * mask, tx * mask)
                loss_y = self.bce_loss(y * mask, ty * mask)
                loss_w = self.mse_loss(w * mask, tw * mask)
                loss_h = self.mse_loss(h * mask, th * mask)
                loss_conf = self.bce_loss(conf * mask, mask) + \
                    0.5 * self.bce_loss(conf * noobj_mask, noobj_mask * 0.0)
                loss_cls = self.bce_loss(pred_cls[mask == 1], tcls[mask == 1])
                #  total loss = losses * weight
                loss = loss_x * self.lambda_xy + loss_y * self.lambda_xy + \
                    loss_w * self.lambda_wh + loss_h * self.lambda_wh + \
                    loss_conf * self.lambda_conf + loss_cls * self.lambda_cls
    
                return loss, loss_x.item(), loss_y.item(), loss_w.item(),\
                    loss_h.item(), loss_conf.item(), loss_cls.item()
            else:
                FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
                LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
                # Calculate offsets for each grid
                grid_x = torch.linspace(0, in_w-1, in_w).repeat(in_w, 1).repeat(
                    bs * self.num_anchors, 1, 1).view(x.shape).type(FloatTensor)
                grid_y = torch.linspace(0, in_h-1, in_h).repeat(in_h, 1).t().repeat(
                    bs * self.num_anchors, 1, 1).view(y.shape).type(FloatTensor)
                # Calculate anchor w, h
                anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
                anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
                anchor_w = anchor_w.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(w.shape)
                anchor_h = anchor_h.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(h.shape)
                # Add offset and scale with anchors
                pred_boxes = FloatTensor(prediction[..., :4].shape)
                pred_boxes[..., 0] = x.data + grid_x
                pred_boxes[..., 1] = y.data + grid_y
                pred_boxes[..., 2] = torch.exp(w.data) * anchor_w
                pred_boxes[..., 3] = torch.exp(h.data) * anchor_h
                # Results
                _scale = torch.Tensor([stride_w, stride_h] * 2).type(FloatTensor)
                output = torch.cat((pred_boxes.view(bs, -1, 4) * _scale,
                                    conf.view(bs, -1, 1), pred_cls.view(bs, -1, self.num_classes)), -1)
                return output.data
    

    相关文章

      网友评论

          本文标题:yoloV3

          本文链接:https://www.haomeiwen.com/subject/gkwmkqtx.html