美文网首页
Adaptive Training Sample Selecti

Adaptive Training Sample Selecti

作者: _从前从前_ | 来源:发表于2021-01-13 14:36 被阅读0次

    一、主要贡献

    作者以RetinaNetFCOS为例,分析了anchor-basedanchor-free的性能差异的原因:

    • 1、每个位置的anchor数量不同。retinanet每个点多个anchor,fcos每个点只有一个anchor point
    • 2、正负样本的定义方法不同。retinanet使用IOU的双阈值,fcos使用空间和尺度限制
    • 3、回归的初始状态。retinanet是修改先验的anchor;fcos是使用anchor point。

    ATSS论文的主要贡献

    • 1、指出anchor-based和anchor-free的检测方法的本质区别是由于正负样本的定义不同
    • 2、提出一个通过目标的统计特征,在训练过程中自适应进行正负样本分配
    • 3、证明在一个位置放置多个anchor去检测目标是一个低效的方法
    • 4、在没有任何成本的情况下达到了COCO上最好的表现

    抛出了一个在目标检测领域的核心问题,即label asign,如何分配正负样本?

    二、分析anchor-free和anchor-based方法的差距

    作者为了公平的比较两者实际的差异,使用相同的训练方法和tricks,并且将RetinaNet每个位置的anchor设为1。但是两者依旧存在0.8%的差距。

    image.png
    作者继续分析了存在差距的原因:
    • 1、正负样本的定义方法


      image.png
    • 2、回归的初始状态,即对anchor回归还是对一个中心点回归。


      image.png

    通过以下实验的,得出结论:正负样本的定义方法才是核心原因

    image.png

    三、提出Adaptive Training Sample Selection

    在训练的过程中,通过目标的统计特征,自动进行正负样本的划分。具体过程:

    • 1、对于每个ground-truthg,通过L2距离选择k个离其中心点最近的anchor,对于\mathcal L层特征金字塔,共存在k \times \mathcal L个候选的正样本。

    • 2、计算挑选出来的候选的正样本和g之间的IOU。计算相应的均值m_g和标准差v_g

    • 3、通过均值和标准差这两个统计特征,得到阈值t_g = m_g + v_g

    • 4、如果候选样本中IOU大于t_g,并且候选样本的中心点位于ground-truth中,将其标记为正样本

    • 5、如果一个anchor box被分配给了多个ground-truth,仅保留IOU最大的。

      image.png
    • 1、为什么通过中心点的欧式距离选择候选的正样本?
      对于RetinaNetFCOS,越靠近ground-truth,预测效果越好。

    • 2、为什么使用了均值和标准差作为IOU阈值?
      可以自动调节选取正负样本的阈值。比如当出现高方差的时候,往往意味着有一个FPN层出现了较高的IOU,说明该层非常适合这个物体的预测,因此最终的正样本都出自该层;而出现低方差的时候,说明有多个FPN层适合预测这个物体,因此会在多个层选取正样本。

      image.png
    • 3、为什么限制anchor box的中心点要在ground-truth中?
      中心点在ground-truth之外的anchor box往往属于poor candidates。使用ground-truth外的特征去预测ground-truth

    • 4、采用这种label asign划分正负样本是否有效
      根据统计统计学,虽然不是标准的正态分布,但是仍然大约会有16%的候选样本会被划分为正样本,每一个ground-truth在不同尺度、不同比例、不同位置都会分配0.2 \times k \times \mathcal L个正样本。相反对于RetinaNetFCOS的分配策略而言,大的物体会有更多的正样本,这并不是一种公平的方式。

    • 5、如何选择超参数k
      对于k的选择并不敏感。

      image.png

    四、结果验证

    1、使用了 ATSS后,RetinaNetFCOS无明显差距

    image.png
    2、不同尺度和不同比例的anchor box效果都很鲁棒
    image.png
    image.png
    3、引入ATSS策略后,设置anchor数量与结果没有明显的关系。
    image.png
    4、ATSS的性能
    image.png

    五、源码实现

    源码参考了mmdetection的实现:

    @BBOX_ASSIGNERS.register_module()
    class ATSSAssigner(BaseAssigner):
        """Assign a corresponding gt bbox or background to each bbox.
    
        Each proposals will be assigned with `0` or a positive integer
        indicating the ground truth index.
    
        - 0: negative sample, no assigned gt
        - positive integer: positive sample, index (1-based) of assigned gt
    
        Args:
            topk (float): number of bbox selected in each level
        """
    
        def __init__(self,
                     topk,
                     iou_calculator=dict(type='BboxOverlaps2D'),
                     ignore_iof_thr=-1):
            self.topk = topk
            self.iou_calculator = build_iou_calculator(iou_calculator)
            self.ignore_iof_thr = ignore_iof_thr
    
        # https://github.com/sfzhang15/ATSS/blob/master/atss_core/modeling/rpn/atss/loss.py
    
        def assign(self,
                   bboxes,
                   num_level_bboxes,
                   gt_bboxes,
                   gt_bboxes_ignore=None,
                   gt_labels=None):
            """Assign gt to bboxes.
    
            The assignment is done in following steps
    
            1. compute iou between all bbox (bbox of all pyramid levels) and gt
            2. compute center distance between all bbox and gt
            3. on each pyramid level, for each gt, select k bbox whose center
               are closest to the gt center, so we total select k*l bbox as
               candidates for each gt
            4. get corresponding iou for the these candidates, and compute the
               mean and std, set mean + std as the iou threshold
            5. select these candidates whose iou are greater than or equal to
               the threshold as postive
            6. limit the positive sample's center in gt
    
    
            Args:
                bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
                num_level_bboxes (List): num of bboxes in each level
                gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
                gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
                    labelled as `ignored`, e.g., crowd boxes in COCO.
                gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).
    
            Returns:
                :obj:`AssignResult`: The assign result.
            """
            INF = 100000000
            bboxes = bboxes[:, :4]
            num_gt, num_bboxes = gt_bboxes.size(0), bboxes.size(0)
    
            # compute iou between all bbox and gt
            overlaps = self.iou_calculator(bboxes, gt_bboxes)
    
            # assign 0 by default
            assigned_gt_inds = overlaps.new_full((num_bboxes, ),
                                                 0,
                                                 dtype=torch.long)
    
            if num_gt == 0 or num_bboxes == 0:
                # No ground truth or boxes, return empty assignment
                max_overlaps = overlaps.new_zeros((num_bboxes, ))
                if num_gt == 0:
                    # No truth, assign everything to background
                    assigned_gt_inds[:] = 0
                if gt_labels is None:
                    assigned_labels = None
                else:
                    assigned_labels = overlaps.new_full((num_bboxes, ),
                                                        -1,
                                                        dtype=torch.long)
                return AssignResult(
                    num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)
    
            # compute center distance between all bbox and gt
            gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0
            gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0
            gt_points = torch.stack((gt_cx, gt_cy), dim=1)
    
            bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0
            bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0
            bboxes_points = torch.stack((bboxes_cx, bboxes_cy), dim=1)
    
            distances = (bboxes_points[:, None, :] -
                         gt_points[None, :, :]).pow(2).sum(-1).sqrt()
    
            if (self.ignore_iof_thr > 0 and gt_bboxes_ignore is not None
                    and gt_bboxes_ignore.numel() > 0 and bboxes.numel() > 0):
                ignore_overlaps = self.iou_calculator(
                    bboxes, gt_bboxes_ignore, mode='iof')
                ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
                ignore_idxs = ignore_max_overlaps > self.ignore_iof_thr
                distances[ignore_idxs, :] = INF
                assigned_gt_inds[ignore_idxs] = -1
    
            # Selecting candidates based on the center distance
            candidate_idxs = []
            start_idx = 0
            for level, bboxes_per_level in enumerate(num_level_bboxes):
                # on each pyramid level, for each gt,
                # select k bbox whose center are closest to the gt center
                end_idx = start_idx + bboxes_per_level
                distances_per_level = distances[start_idx:end_idx, :]
                selectable_k = min(self.topk, bboxes_per_level)
                _, topk_idxs_per_level = distances_per_level.topk(
                    selectable_k, dim=0, largest=False)
                candidate_idxs.append(topk_idxs_per_level + start_idx)
                start_idx = end_idx
            candidate_idxs = torch.cat(candidate_idxs, dim=0)
    
            # get corresponding iou for the these candidates, and compute the
            # mean and std, set mean + std as the iou threshold
            candidate_overlaps = overlaps[candidate_idxs, torch.arange(num_gt)]
            overlaps_mean_per_gt = candidate_overlaps.mean(0)
            overlaps_std_per_gt = candidate_overlaps.std(0)
            overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt
    
            is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :]
    
            # limit the positive sample's center in gt
            for gt_idx in range(num_gt):
                candidate_idxs[:, gt_idx] += gt_idx * num_bboxes
            ep_bboxes_cx = bboxes_cx.view(1, -1).expand(
                num_gt, num_bboxes).contiguous().view(-1)
            ep_bboxes_cy = bboxes_cy.view(1, -1).expand(
                num_gt, num_bboxes).contiguous().view(-1)
            candidate_idxs = candidate_idxs.view(-1)
    
            # calculate the left, top, right, bottom distance between positive
            # bbox center and gt side
            l_ = ep_bboxes_cx[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 0]
            t_ = ep_bboxes_cy[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 1]
            r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].view(-1, num_gt)
            b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].view(-1, num_gt)
            is_in_gts = torch.stack([l_, t_, r_, b_], dim=1).min(dim=1)[0] > 0.01
            is_pos = is_pos & is_in_gts
    
            # if an anchor box is assigned to multiple gts,
            # the one with the highest IoU will be selected.
            overlaps_inf = torch.full_like(overlaps,
                                           -INF).t().contiguous().view(-1)
            index = candidate_idxs.view(-1)[is_pos.view(-1)]
            overlaps_inf[index] = overlaps.t().contiguous().view(-1)[index]
            overlaps_inf = overlaps_inf.view(num_gt, -1).t()
    
            max_overlaps, argmax_overlaps = overlaps_inf.max(dim=1)
            assigned_gt_inds[
                max_overlaps != -INF] = argmax_overlaps[max_overlaps != -INF] + 1
    
            if gt_labels is not None:
                assigned_labels = assigned_gt_inds.new_full((num_bboxes, ), -1)
                pos_inds = torch.nonzero(
                    assigned_gt_inds > 0, as_tuple=False).squeeze()
                if pos_inds.numel() > 0:
                    assigned_labels[pos_inds] = gt_labels[
                        assigned_gt_inds[pos_inds] - 1]
            else:
                assigned_labels = None
            return AssignResult(
                num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)
    

    相关文章

      网友评论

          本文标题:Adaptive Training Sample Selecti

          本文链接:https://www.haomeiwen.com/subject/qdcuaktx.html