RepPoints

作者: 小松qxs | 来源:发表于2019-10-29 21:47 被阅读0次

RepPoints
RepPoint：可形变卷积生成的目标轮廓点集

titile	RepPoints: Point Set Representation for Object Detection
url	https://arxiv.org/pdf/1904.11490.pdf
动机	detector大多依赖矩形框表示各个阶段的物体。易于标注和提取特征(ROI)，但其仅提供粗定位(几何信息，shape/pose)，且提取特征粗糙(网格点不一定位于前景区域)。
内容	RepPoints(anchor-free)：更细粒度的定位并有助于分类。能自适应地分布在物体重要的局部语义区域，并能表征物体的几何外延，提供更加细致的几何描述，这些点也用于提取对识别有用的图像特征。不需要anchor采样得到bounding box。与现有对比：多数非矩形框方法是bottom-up，先识别关键点(角点或极值点)，之后依靠聚类进行group，需要gt mask监督。 RepPoint top-down，可以端到端训练，无需额外的监督。 Bounding boxes for the object detection problem： 1、矩形框易于标注，不易产生二义性，同时现有的提取特征方法多基于矩形框。 2、RepPoints虽不规则形式，但可以方便地进行特征提取。结合deformable convolution，多个采样点聚合来自输入的信息。易于产生矩形伪框。 3、之前的deformable convolution难以解释，偏移量自由学习。 Bounding boxes in modern object detectors： 1、anchor替换为中心点。The bounding box proposals and final localization targets are replaced by the RepPoints proposals and final targets。 2、anchor-base难以调参、正负样本不均衡，中心点来表示初始对象，比anchor-base更方便。 Deformation modeling in object recognition： deformable convolution and deformable RoI pooling均仅用于改进特征提取。自适应采样点来更精细定位。 The RepPoints Representation： Bounding Box Representation：缺点：需要refinement值很小是表现比较好，较大时不好；∆x, ∆y and ∆w, ∆h，需要调节相互间损失权重从而达到最优性能。 RepPoints：矩形框：不能体现shape、pose和局部区域重要的语义信息来更好的定位和提取特征。 RepPoints refinement： refine不涉及scale信息，因为RepPoints的refine过程offsets均在相同的scale进行。 Converting RepPoints to bounding box：通过RepPoints产生伪框适应训练时的标注，和测试。转换函数有以下三种： 1、Min-max function：在所有点中找最小和最大值，获得包括所有点的外接框。 2、Partial min-max function：采样部分点，进行上述操作。 3、Moment-based function：用所有点的均值和方差计算box的中心点和scale，scale需要乘全局共享的可学习乘子λx和λy。三种方法可以插入到object detection system中，效果差异较小。 Learning RepPoints： RepPoints的学习是object localization loss和object recognition loss共同驱动，RepPoints自动学习极值点和语义关键点(图4)。计算localization loss，先将RepPoints转换为伪框。然后计算伪框与gt框间差异(通过top-left和bottom-right points计算L1 smooth)。 RPDet: an Anchor Free Detector： RPDet：基于deformable convolution的两阶段识别。deformable convolution与RepPoints配合：卷积是在不规则分布的采样点集合上计算的，识别反馈可以指导训练这些点的位置。 Center point based initial object representation： 1、基于中心点初始化，feature每个位置作为中心点。anchor方法搜索空间是4d的(额外引入大量ratio和scale)，存在冗余，anchor free搜索空间是2d，已经足够。 2、target二义性问题： (1) FPN的多尺度把同一个位置但尺度不同的物体分开。 (2) FPN高分辨率层较大，减少两个物体落在一个Location的概率。 (3) FPN后只有1.1%的目标会遇到二义性，随机分配。 Utilization of RepPoints：通过回归中心点的偏移量可以获得第一组RepPoint。RepPoint的学习受两个因素影响： (1) 伪框和gt框之间的top-left、bottom-right损失； (2) 后续阶段的识别损失。第二组RepPoints代表最终的定位，由第一组RepPoints refine。仅受points distance loss影响，第二组RepPoint学习更精准的定位。 Relation to deformable RoI pooling： RepPoint与deformable RoI pooling在检测中作用不同。 RepPoints是object几何表示，得到更准确的语义定位，而deformable RoI pooling学习object的外观特征，无法表示精确定位的样本点。 Backbone and head architectures： FPN：3-7 head：两个非共享子网： 1、定位：三个256 d 3×3卷积，两个连续的小型网络计算两组RepPoint的offsets。 2、分类：三个256-d 3×3卷积，256-d 3×3 deformable conv的input offset和定位分支共享。两个子网中前三个256-d 3×3 conv层每一个后加归一化层。两个阶段的localization比RetinaNet(使用ResNet-50的210.9 vs. 234.5 GFLOPS)效率更高。附加的localization由于层共享几乎无开销。无anchor设计减少分类层负担，因此计算量略有减少。 Localization/class target assignment： 1、localization：第一阶段：gt的大小满足对应pyramid level，且gt的中心点落在对应的location时，该location是正例。 pyramid level：第二阶段：伪框与gt iou大于0.5的points为正例。 2、classification：只在第一阶段计算：大于0.5的是正例，小于0.4是负例。
实验	Experimental Settings： ResNet-50，SGD(2 images per GPU)，NMS (σ = 0.5) Ablation Study： RepPoints vs. bounding box： Supervision source for RepPoints learning： object recognition loss使RepPoint定位在object语义有意义的位置，进行细粒度的定位，并提高识别阶段的特征提取的准确性。 Anchor-free vs. anchor-based： Converting RepPoints to pseudo box： RepPoints act complementary to deformable RoI pooling： bounding box和RepPoints加deformable RoI pooling性能均提升，说明deformable RoI pooling和RepPoints是互补的。 State-of-the-art Comparison：
思考	用9个reppoints点还原一个box，理论上用两个点就可以还原一个box，这里选取9个的原因：9个点计算Box更精准。无监督的过程，9个点经常落在extreme points或对语义表达有帮助的地方