    Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation


    Fangyun Wei, Xiao Sun, Hongyang Li, Jingdong Wang, and Stephen Lin


    Microsoft Research Asia
    Peking University


    To appear in ECCV 2020




    A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person. While this center-point regression is simple and efficient, we argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries, due to object deformation and scale/orientation variation. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions. This point set is arranged to reflect a good initialization for the given task, such as modes in the training data for pose estimation, which lie closer to the ground truth than the central point and provide more informative features for regression. As the utility of a point set depends on how well its scale, aspect ratio and rotation matches the target, we adopt the anchor box technique of sampling these transformations to generate additional point-set candidates. We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation. Our results show that this general-purpose approach can achieve performance competitive with state-of-the-art methods for each of these tasks.


    1. A new object representation named Point-Set Anchors, which can be seen as a generalization and extension of classical box anchors. Point-set anchors can further provide informative features and better task-specific initializations for shape regression.
    2. A network based on point-set anchors called PointSetNet, which is a modification of RetinaNet [23] that simply replaces the anchor boxes with the proposed point-set anchors and also attaches a parallel regression branch. Variants of this network are applied to object detection, human pose estimation, and also instance segmentation, for which the problem of defining specific regression targets is addressed.
    3. It is shown that the proposed general-purpose approach achieves performance competitive with state-of-the-art methods on object detection, instance segmentation and pose estimation.



    Three matching strategies between point-set anchor and the ground-truth mask contour for instance segmentation.









    1. 这篇文章提出了Point-Set Anchors, 是经典的bboxes的拓展和延申,可以在目标检测,实例分割,姿势检测中作为通用框架。
    2. 这篇文章提出了PointSetNet, 这是网络是在RetinaNet的基础上把achors boxes的部分替换成point-set anchors,为关键点回归附加一个并行分支。


