美文网首页
py-faster-rcnn之proposal layer 学习

py-faster-rcnn之proposal layer 学习

作者: 寒夏凉秋 | 来源:发表于2019-01-21 20:32 被阅读0次

    功能:

    proposal layer 负责将 回归得到的([dx(A),dy(A),dw(A),dh(A)]与 foreground anchors 结合,计算出精准的proposal ,送入后续的 Roi Pooling layer

    层的定义:

    layer {
      name: 'proposal'
      type: 'Python'
      bottom: 'rpn_cls_prob_reshape'
      bottom: 'rpn_bbox_pred'
      bottom: 'im_info'
      top: 'rpn_rois'
    #  top: 'rpn_scores'
      python_param {
        module: 'rpn.proposal_layer'
        layer: 'ProposalLayer'
        param_str: "'feat_stride': 16"
      }
    }
    

    输入:

    • fg/bg anchors 分类器的结果:rpn_cls_prob_reshape;
    • 对应的bbox reg的[dx(A),dy(A),dw(A),dh(A)]变换量rpn_bbox_pred
    • im_info 图片信息
    • feat_stride=16 步长为16

    (一)bounding box regression原理

    image

    如图所示:
    绿色框为 Ground Truth(GT)
    红色框为 提取的foreground anchors;
    我们需要对红色框进行修正;

    对于每个框,我们用(x,y,w,h)中心点(x,y),以及宽高(w,h)表示;

    则红色框A (A_{x},A_{y},A_{w},A_{h}) 代表原始的foreground anchors
    绿色框G(G_{x},G_{y},G_{w},G_{h})代表目标的GT;
    给定一种映射f ,使得f(A_{x},A_{y},A_{w},A_{h})=(G_{x}^{'},G_{y}^{'},G_{w}^{'},G_{h}^{'}),其中,(G_{x}^{'},G_{y}^{'},G_{w}^{'},G_{h}^{'})\approx(G_{x},G_{y},G_{w},G_{h});

    image

    (1) 先做平移:

    G_{x}^{'}=A_{x} * d_{x}(A) + A_{x} \\ G_{y}^{'} =A_{h} * d_{y}(A)+A_{y}

    (2)再做缩放:

    G_{w}^{'} = A_{w} * exp(d_{w}(A))\\ G_{h}^{'}=A_{h} * exp(d_{h}(A))

    所以,需要学习的是d_{x}(A),d_{y}(A),d_{w}(A),d_{w}(A),d_{h}(A) 这四个变化;

    在Faster R-CNN 原文中,平移量(t_{x},t_{y})与尺度因子(t_{w},t_{h})如下:

    t_{x}=\frac{(G_{x}-A{x})}{A_{w}}\\ t_{y} = \frac{(G_{y}-A{y})}{A_{h}}\\ t_{w}=\frac{ln(G_{y}-A_{y})}{A_{w}}\\ t_{h}=\frac{(G_{y}-A{y})}{A_{h}}

    接下来的问题就是如何通过线性回归获得dx(A),dy(A),dw(A),dh(A)了。线性回归就是给定输入的特征向量X, 学习一组参数W, 使得经过线性回归后的值跟真实值Y(即GT)非常接近,即Y=WX。对于该问题,输入X是一张经过num_output=1的1x1卷积获得的feature map,定义为Φ;同时还有训练传入的GT,即(tx, ty, tw, th)。输出是dx(A),dy(A),dw(A),dh(A)四个变换。
    那么目标函数可以表示为:

    d_{*}(A)=W_{*}^{T} * \Phi(A)
    其中\phi(A)为对应anchor的feature map 组成的特征向量,w 是需要学习的参数,d(A)是预测值(x,y,w,h);
    所用我们采用mse loss 的话:

    Loss=\sum_{i}^{N}(t_{i}-\widehat{A}^{T}_{*} *\Phi(A^{i}))^{2}

    所以正则化后.函数优化目标为:

    W_{*} = argmin\sum_{i}^{N}(t_{*}^{i}-\widehat{W}_{*}^{T} * \Phi(A^{i})) + \lambda ||\widehat{w_{*}}||^{2}

    在caffe 中,利用一个conv 层进行 (dx,dy,dw,dh)的学习:

    layer {
      name: "rpn_bbox_pred"
      type: "Convolution"
      bottom: "rpn/output"
      top: "rpn_bbox_pred"
      convolution_param {
        num_output: 36   # 4 * 9(anchors)
        kernel_size: 1 pad: 0 stride: 1
      }
    }
    
    

    caffe blob存储为[1, 4x9, Q, P]。与上文中fg/bg anchors存储为[1, 18, Q, P]类似;

    在py-faster-rcnn 中,回归框的代码如 下:

    ##bbox_transform.py
    def bbox_transform_inv(boxes, deltas):
        if boxes.shape[0] == 0:
            return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
    
        boxes = boxes.astype(deltas.dtype, copy=False)
    
        widths = boxes[:, 2] - boxes[:, 0] + 1.0
        heights = boxes[:, 3] - boxes[:, 1] + 1.0
        ctr_x = boxes[:, 0] + 0.5 * widths
        ctr_y = boxes[:, 1] + 0.5 * heights
    
        dx = deltas[:, 0::4]
        dy = deltas[:, 1::4]
        dw = deltas[:, 2::4]
        dh = deltas[:, 3::4]
    
        #利用公式得到回归框的位置
        #(1)平移框的位置
        pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
        pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
        #(2)缩放框的位置
        pred_w = np.exp(dw) * widths[:, np.newaxis]
        pred_h = np.exp(dh) * heights[:, np.newaxis]
    
        pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
    
        #重新恢复[x1,y1,x2,y2]的形式
        # x1
        pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
        # y1
        pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
        # x2
        pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
        # y2
        pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
    
        return pred_boxes
    

    (二)proposal forward函数解析

    1. 首先,获取参数,方便进行nms
        def forward(self, bottom, top):
            # Algorithm:
            #
            # for each (H, W) location i
            #   generate A anchor boxes centered on cell i
            #   apply predicted bbox deltas at cell i to each of the A anchors
            # clip predicted boxes to image
            # remove predicted boxes with either height or width < threshold
            # sort all (proposal, score) pairs by score from highest to lowest
            # take top pre_nms_topN proposals before NMS
            # apply NMS with threshold 0.7 to remaining proposals
            # take after_nms_topN proposals after NMS
            # return the top proposals (-> RoIs top, scores top)
    
            assert bottom[0].data.shape[0] == 1, \
                'Only single item batches are supported'
    
            cfg_key = str(self.phase) # either 'TRAIN' or 'TEST'
    
            pre_nms_topN  = cfg[cfg_key].RPN_PRE_NMS_TOP_N
            post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
            nms_thresh    = cfg[cfg_key].RPN_NMS_THRESH
            min_size      = cfg[cfg_key].RPN_MIN_SIZE
    
    1. 再次产生anchor,方便对框进行精准回归(之前anchorTarget 产生过一次anchor, 那是为了与target计算IOU,方便计算哪个anchor 属于fg/bg)
    
            # Enumerate all shifts  
            #再次生产anchors
            shift_x = np.arange(0, width) * self._feat_stride
            shift_y = np.arange(0, height) * self._feat_stride
            shift_x, shift_y = np.meshgrid(shift_x, shift_y)
            shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                                shift_x.ravel(), shift_y.ravel())).transpose()
    
            # Enumerate all shifted anchors:
            #
            # add A anchors (1, A, 4) to
            # cell K shifts (K, 1, 4) to get
            # shift anchors (K, A, 4)
            # reshape to (K*A, 4) shifted anchors
            A = self._num_anchors
            K = shifts.shape[0]
            anchors = self._anchors.reshape((1, A, 4)) + \
                      shifts.reshape((1, K, 4)).transpose((1, 0, 2))
            anchors = anchors.reshape((K * A, 4))
    
            # Transpose and reshape predicted bbox transformations to get them
            # into the same order as the anchors:
            #
            # bbox deltas will be (1, 4 * A, H, W) format
            # transpose to (1, H, W, 4 * A)
            # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
            # in slowest to fastest order
            bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))
    
            # Same story for the scores:
            #
            # scores are (1, A, H, W) format
            # transpose to (1, H, W, A)
            # reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)
            #
            scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))
    
    1. 对anchor 框信息做回归拟合,为了得到更准确的框,同时,将回归后的框 去除 超出图像边缘部分,
            # Convert anchors into proposals via bbox transformations
            #利用 anchor 做 bounding box regression位置回归
    
            proposals = bbox_transform_inv(anchors, bbox_deltas)
    
            # 2. clip predicted boxes to image
            #取
            proposals = clip_boxes(proposals, im_info[:2])
    
            # 3. remove predicted boxes with either height or width < threshold
            # (NOTE: convert min_size to input image scale stored in im_info[2])
            keep = _filter_boxes(proposals, min_size * im_info[2])
            proposals = proposals[keep, :]
            scores = scores[keep]
    

    clip_boxes 函数很简单, 将(x1,y1,x2,y2)分别与图像边缘做比较,保留在边缘内的框

    def clip_boxes(boxes, im_shape):
        """
        Clip boxes to image boundaries.
        """
        
        # x1 >= 0
        boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
        # y1 >= 0
        boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
        # x2 < im_shape[1]
        boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
        # y2 < im_shape[0]
        boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
        return boxes
    
    

    _filter_boxes 函数要做的事情是保留一定宽高的框;

    def _filter_boxes(boxes, min_size):
        """Remove all boxes with any side smaller than min_size."""
        ws = boxes[:, 2] - boxes[:, 0] + 1
        hs = boxes[:, 3] - boxes[:, 1] + 1
        keep = np.where((ws >= min_size) & (hs >= min_size))[0]
        return keep
    
    
    1. 最后,做一遍nms,保留分数最大的N个框:
    
            # 6. apply nms (e.g. threshold = 0.7)
            # 7. take after_nms_topN (e.g. 300)
            # 8. return the top proposals (-> RoIs top)
            keep = nms(np.hstack((proposals, scores)), nms_thresh)
            if post_nms_topN > 0:
                keep = keep[:post_nms_topN]
            proposals = proposals[keep, :]
            scores = scores[keep]
    
            # Output rois blob
            # Our RPN implementation only supports a single input image, so all
            # batch inds are 0
            batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
            blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
            top[0].reshape(*(blob.shape))
            top[0].data[...] = blob
    
            # [Optional] output scores blob
            if len(top) > 1:
                top[1].reshape(*(scores.shape))
                top[1].data[...] = scores
    

    nms python部分代码:

    其实现的思想主要是将各个框的置信度进行排序,然后选择其中置信度最高的框A,将其作为标准选择其他框,同时设置一个阈值,当其他框B与A的重合程度超过阈值就将B舍弃掉,然后在剩余的框中选择置信度最大的框,重复上述操作。

    作者在代码中为了速度,使用了用C 写的nms ;
    但是,作者给出了python 版本的nms baseline ,方便理解:

    import numpy as np
    
    def py_cpu_nms(dets, thresh):
        """Pure Python NMS baseline."""
        x1 = dets[:, 0]
        y1 = dets[:, 1]
        x2 = dets[:, 2]
        y2 = dets[:, 3]
        scores = dets[:, 4]
    
        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
        order = scores.argsort()[::-1]
    
        keep = []
        while order.size > 0:
            i = order[0]
            keep.append(i)
            xx1 = np.maximum(x1[i], x1[order[1:]])
            yy1 = np.maximum(y1[i], y1[order[1:]])
            xx2 = np.minimum(x2[i], x2[order[1:]])
            yy2 = np.minimum(y2[i], y2[order[1:]])
    
            w = np.maximum(0.0, xx2 - xx1 + 1)
            h = np.maximum(0.0, yy2 - yy1 + 1)
            inter = w * h
            ovr = inter / (areas[i] + areas[order[1:]] - inter)
    
            inds = np.where(ovr <= thresh)[0]
            order = order[inds + 1]
    
        return keep
    
    

    完整版本forward代码:

        def forward(self, bottom, top):
            # Algorithm:
            #
            # for each (H, W) location i
            #   generate A anchor boxes centered on cell i
            #   apply predicted bbox deltas at cell i to each of the A anchors
            # clip predicted boxes to image
            # remove predicted boxes with either height or width < threshold
            # sort all (proposal, score) pairs by score from highest to lowest
            # take top pre_nms_topN proposals before NMS
            # apply NMS with threshold 0.7 to remaining proposals
            # take after_nms_topN proposals after NMS
            # return the top proposals (-> RoIs top, scores top)
    
            assert bottom[0].data.shape[0] == 1, \
                'Only single item batches are supported'
    
            cfg_key = str(self.phase) # either 'TRAIN' or 'TEST'
    
            pre_nms_topN  = cfg[cfg_key].RPN_PRE_NMS_TOP_N
            post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
            nms_thresh    = cfg[cfg_key].RPN_NMS_THRESH
            min_size      = cfg[cfg_key].RPN_MIN_SIZE
    
            # the first set of _num_anchors channels are bg probs
            # the second set are the fg probs, which we want
            #取softmax 分类后的各个框的分数
            scores = bottom[0].data[:, self._num_anchors:, :, :]
            bbox_deltas = bottom[1].data
            im_info = bottom[2].data[0, :]
    
            if DEBUG:
                print 'im_size: ({}, {})'.format(im_info[0], im_info[1])
                print 'scale: {}'.format(im_info[2])
    
            # 1. Generate proposals from bbox deltas and shifted anchors
            height, width = scores.shape[-2:]
    
            if DEBUG:
                print 'score map size: {}'.format(scores.shape)
    
            # Enumerate all shifts  
            #再次生产anchors
            shift_x = np.arange(0, width) * self._feat_stride
            shift_y = np.arange(0, height) * self._feat_stride
            shift_x, shift_y = np.meshgrid(shift_x, shift_y)
            shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                                shift_x.ravel(), shift_y.ravel())).transpose()
    
            # Enumerate all shifted anchors:
            #
            # add A anchors (1, A, 4) to
            # cell K shifts (K, 1, 4) to get
            # shift anchors (K, A, 4)
            # reshape to (K*A, 4) shifted anchors
            A = self._num_anchors
            K = shifts.shape[0]
            anchors = self._anchors.reshape((1, A, 4)) + \
                      shifts.reshape((1, K, 4)).transpose((1, 0, 2))
            anchors = anchors.reshape((K * A, 4))
    
            # Transpose and reshape predicted bbox transformations to get them
            # into the same order as the anchors:
            #
            # bbox deltas will be (1, 4 * A, H, W) format
            # transpose to (1, H, W, 4 * A)
            # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
            # in slowest to fastest order
            bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))
    
            # Same story for the scores:
            #
            # scores are (1, A, H, W) format
            # transpose to (1, H, W, A)
            # reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)
            #
            scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))
    
            # Convert anchors into proposals via bbox transformations
            #利用 anchor 做 bounding box regression位置回归
    
            proposals = bbox_transform_inv(anchors, bbox_deltas)
    
            # 2. clip predicted boxes to image
            #取
            proposals = clip_boxes(proposals, im_info[:2])
    
            # 3. remove predicted boxes with either height or width < threshold
            # (NOTE: convert min_size to input image scale stored in im_info[2])
            keep = _filter_boxes(proposals, min_size * im_info[2])
            proposals = proposals[keep, :]
            scores = scores[keep]
    
            # 4. sort all (proposal, score) pairs by score from highest to lowest
            # 5. take top pre_nms_topN (e.g. 6000)
            order = scores.ravel().argsort()[::-1]
            if pre_nms_topN > 0:
                order = order[:pre_nms_topN]
            proposals = proposals[order, :]
            scores = scores[order]
    
            # 6. apply nms (e.g. threshold = 0.7)
            # 7. take after_nms_topN (e.g. 300)
            # 8. return the top proposals (-> RoIs top)
            keep = nms(np.hstack((proposals, scores)), nms_thresh)
            if post_nms_topN > 0:
                keep = keep[:post_nms_topN]
            proposals = proposals[keep, :]
            scores = scores[keep]
    
            # Output rois blob
            # Our RPN implementation only supports a single input image, so all
            # batch inds are 0
            batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
            blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
            top[0].reshape(*(blob.shape))
            top[0].data[...] = blob
    
            # [Optional] output scores blob
            if len(top) > 1:
                top[1].reshape(*(scores.shape))
                top[1].data[...] = scores
    

    总结:

    proposal layer forward 层依次进行的操作为:

    • 再次生产anchor,并对所有的anchor做了一边bbox reg 位置回归操作;
    • 按照输入的foreground softmax scores由大到小排序anchors,提取前pre_nms_topN(e.g. 6000)个anchors。即提取修正位置后的foreground anchors
    • 利用feat_stride和im_info将anchors映射回原图,判断fg anchors是否大范围超过边界,剔除严重超出边界fg anchors。
    • 进行nms(nonmaximum suppression,非极大值抑制)
    • 再次按照nms后的foreground softmax scores由大到小排序fg anchors,提取前post_nms_topN(e.g. 300)结果作为proposal输出

    相关文章

      网友评论

          本文标题:py-faster-rcnn之proposal layer 学习

          本文链接:https://www.haomeiwen.com/subject/fflhjqtx.html