一、主要贡献
作者以RetinaNet
和FCOS
为例,分析了anchor-based
和anchor-free
的性能差异的原因:
- 1、每个位置的anchor数量不同。retinanet每个点多个anchor,fcos每个点只有一个anchor point
- 2、正负样本的定义方法不同。retinanet使用IOU的双阈值,fcos使用空间和尺度限制
- 3、回归的初始状态。retinanet是修改先验的anchor;fcos是使用anchor point。
ATSS论文的主要贡献:
- 1、指出anchor-based和anchor-free的检测方法的本质区别是由于正负样本的定义不同
- 2、提出一个通过目标的统计特征,在训练过程中自适应进行正负样本分配
- 3、证明在一个位置放置多个anchor去检测目标是一个低效的方法
- 4、在没有任何成本的情况下达到了COCO上最好的表现
抛出了一个在目标检测领域的核心问题,即label asign
,如何分配正负样本?
二、分析anchor-free和anchor-based方法的差距
作者为了公平的比较两者实际的差异,使用相同的训练方法和tricks
,并且将RetinaNet
每个位置的anchor设为1。但是两者依旧存在0.8%的差距。
作者继续分析了存在差距的原因:
-
1、正负样本的定义方法
image.png -
2、回归的初始状态,即对anchor回归还是对一个中心点回归。
image.png
通过以下实验的,得出结论:正负样本的定义方法才是核心原因
三、提出Adaptive Training Sample Selection
在训练的过程中,通过目标的统计特征,自动进行正负样本的划分。具体过程:
-
1、对于每个
ground-truth
,通过距离选择个离其中心点最近的anchor
,对于层特征金字塔,共存在个候选的正样本。 -
2、计算挑选出来的候选的正样本和之间的IOU。计算相应的均值和标准差。
-
3、通过均值和标准差这两个统计特征,得到阈值
-
4、如果候选样本中IOU大于,并且候选样本的中心点位于
ground-truth
中,将其标记为正样本 -
5、如果一个
image.pnganchor box
被分配给了多个ground-truth
,仅保留IOU最大的。
-
1、为什么通过中心点的欧式距离选择候选的正样本?
对于RetinaNet
和FCOS
,越靠近ground-truth
,预测效果越好。 -
2、为什么使用了均值和标准差作为IOU阈值?
image.png
可以自动调节选取正负样本的阈值。比如当出现高方差的时候,往往意味着有一个FPN层出现了较高的IOU,说明该层非常适合这个物体的预测,因此最终的正样本都出自该层;而出现低方差的时候,说明有多个FPN层适合预测这个物体,因此会在多个层选取正样本。
-
3、为什么限制
anchor box
的中心点要在ground-truth
中?
中心点在ground-truth
之外的anchor box
往往属于poor candidates
。使用ground-truth
外的特征去预测ground-truth
。 -
4、采用这种
label asign
划分正负样本是否有效
根据统计统计学,虽然不是标准的正态分布,但是仍然大约会有16%的候选样本会被划分为正样本,每一个ground-truth
在不同尺度、不同比例、不同位置都会分配个正样本。相反对于RetinaNet
和FCOS
的分配策略而言,大的物体会有更多的正样本,这并不是一种公平的方式。 -
5、如何选择超参数?
image.png
对于的选择并不敏感。
四、结果验证
1、使用了 ATSS后,RetinaNet
和FCOS
无明显差距
2、不同尺度和不同比例的
anchor box
效果都很鲁棒image.png
image.png
3、引入ATSS策略后,设置
anchor
数量与结果没有明显的关系。image.png
4、ATSS的性能
image.png
五、源码实现
源码参考了mmdetection
的实现:
@BBOX_ASSIGNERS.register_module()
class ATSSAssigner(BaseAssigner):
"""Assign a corresponding gt bbox or background to each bbox.
Each proposals will be assigned with `0` or a positive integer
indicating the ground truth index.
- 0: negative sample, no assigned gt
- positive integer: positive sample, index (1-based) of assigned gt
Args:
topk (float): number of bbox selected in each level
"""
def __init__(self,
topk,
iou_calculator=dict(type='BboxOverlaps2D'),
ignore_iof_thr=-1):
self.topk = topk
self.iou_calculator = build_iou_calculator(iou_calculator)
self.ignore_iof_thr = ignore_iof_thr
# https://github.com/sfzhang15/ATSS/blob/master/atss_core/modeling/rpn/atss/loss.py
def assign(self,
bboxes,
num_level_bboxes,
gt_bboxes,
gt_bboxes_ignore=None,
gt_labels=None):
"""Assign gt to bboxes.
The assignment is done in following steps
1. compute iou between all bbox (bbox of all pyramid levels) and gt
2. compute center distance between all bbox and gt
3. on each pyramid level, for each gt, select k bbox whose center
are closest to the gt center, so we total select k*l bbox as
candidates for each gt
4. get corresponding iou for the these candidates, and compute the
mean and std, set mean + std as the iou threshold
5. select these candidates whose iou are greater than or equal to
the threshold as postive
6. limit the positive sample's center in gt
Args:
bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
num_level_bboxes (List): num of bboxes in each level
gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
labelled as `ignored`, e.g., crowd boxes in COCO.
gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).
Returns:
:obj:`AssignResult`: The assign result.
"""
INF = 100000000
bboxes = bboxes[:, :4]
num_gt, num_bboxes = gt_bboxes.size(0), bboxes.size(0)
# compute iou between all bbox and gt
overlaps = self.iou_calculator(bboxes, gt_bboxes)
# assign 0 by default
assigned_gt_inds = overlaps.new_full((num_bboxes, ),
0,
dtype=torch.long)
if num_gt == 0 or num_bboxes == 0:
# No ground truth or boxes, return empty assignment
max_overlaps = overlaps.new_zeros((num_bboxes, ))
if num_gt == 0:
# No truth, assign everything to background
assigned_gt_inds[:] = 0
if gt_labels is None:
assigned_labels = None
else:
assigned_labels = overlaps.new_full((num_bboxes, ),
-1,
dtype=torch.long)
return AssignResult(
num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)
# compute center distance between all bbox and gt
gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0
gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0
gt_points = torch.stack((gt_cx, gt_cy), dim=1)
bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0
bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0
bboxes_points = torch.stack((bboxes_cx, bboxes_cy), dim=1)
distances = (bboxes_points[:, None, :] -
gt_points[None, :, :]).pow(2).sum(-1).sqrt()
if (self.ignore_iof_thr > 0 and gt_bboxes_ignore is not None
and gt_bboxes_ignore.numel() > 0 and bboxes.numel() > 0):
ignore_overlaps = self.iou_calculator(
bboxes, gt_bboxes_ignore, mode='iof')
ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
ignore_idxs = ignore_max_overlaps > self.ignore_iof_thr
distances[ignore_idxs, :] = INF
assigned_gt_inds[ignore_idxs] = -1
# Selecting candidates based on the center distance
candidate_idxs = []
start_idx = 0
for level, bboxes_per_level in enumerate(num_level_bboxes):
# on each pyramid level, for each gt,
# select k bbox whose center are closest to the gt center
end_idx = start_idx + bboxes_per_level
distances_per_level = distances[start_idx:end_idx, :]
selectable_k = min(self.topk, bboxes_per_level)
_, topk_idxs_per_level = distances_per_level.topk(
selectable_k, dim=0, largest=False)
candidate_idxs.append(topk_idxs_per_level + start_idx)
start_idx = end_idx
candidate_idxs = torch.cat(candidate_idxs, dim=0)
# get corresponding iou for the these candidates, and compute the
# mean and std, set mean + std as the iou threshold
candidate_overlaps = overlaps[candidate_idxs, torch.arange(num_gt)]
overlaps_mean_per_gt = candidate_overlaps.mean(0)
overlaps_std_per_gt = candidate_overlaps.std(0)
overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt
is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :]
# limit the positive sample's center in gt
for gt_idx in range(num_gt):
candidate_idxs[:, gt_idx] += gt_idx * num_bboxes
ep_bboxes_cx = bboxes_cx.view(1, -1).expand(
num_gt, num_bboxes).contiguous().view(-1)
ep_bboxes_cy = bboxes_cy.view(1, -1).expand(
num_gt, num_bboxes).contiguous().view(-1)
candidate_idxs = candidate_idxs.view(-1)
# calculate the left, top, right, bottom distance between positive
# bbox center and gt side
l_ = ep_bboxes_cx[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 0]
t_ = ep_bboxes_cy[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 1]
r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].view(-1, num_gt)
b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].view(-1, num_gt)
is_in_gts = torch.stack([l_, t_, r_, b_], dim=1).min(dim=1)[0] > 0.01
is_pos = is_pos & is_in_gts
# if an anchor box is assigned to multiple gts,
# the one with the highest IoU will be selected.
overlaps_inf = torch.full_like(overlaps,
-INF).t().contiguous().view(-1)
index = candidate_idxs.view(-1)[is_pos.view(-1)]
overlaps_inf[index] = overlaps.t().contiguous().view(-1)[index]
overlaps_inf = overlaps_inf.view(num_gt, -1).t()
max_overlaps, argmax_overlaps = overlaps_inf.max(dim=1)
assigned_gt_inds[
max_overlaps != -INF] = argmax_overlaps[max_overlaps != -INF] + 1
if gt_labels is not None:
assigned_labels = assigned_gt_inds.new_full((num_bboxes, ), -1)
pos_inds = torch.nonzero(
assigned_gt_inds > 0, as_tuple=False).squeeze()
if pos_inds.numel() > 0:
assigned_labels[pos_inds] = gt_labels[
assigned_gt_inds[pos_inds] - 1]
else:
assigned_labels = None
return AssignResult(
num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)
网友评论