美文网首页
SSD 检测算法(encode/decode过程)

SSD 检测算法(encode/decode过程)

作者: huim | 来源:发表于2019-08-10 16:42 被阅读0次

    在训练过程中,groundtruth 的坐标形式是(y1,x1,y2,x2),而网络输出的坐标形式是(dx,dy,dw,dh),
    两者之间的表示形式不同,因此要将 groundtruth 和 anchors 结合,来表示anchor_layers 上每个位置的分类 label 和坐标偏移 loc(即网络的输出),即 encode 过程。

    在预测过程中,网络输出的坐标形式是(dx,dy,dw,dh),而真实值的坐标形式是(y1,x1,y2,x2),同样要把网络输出和 anchors 结合,decode成真实值的形式。

    一、encode 过程

    已知 anchors(坐标形式:x,y,w,h)
    已知 gt_labels , gt_bboxes(坐标形式:y1,x1,y2,x2)

    目标是将 groundtruth 和 anchors 结合起来,得到每个位置的偏移量,这样才能和网络的输出做 loss。
    (注意:anchors 的位置始终不变,因此 groundtruth 和 anchors 的差值就是偏移量了)

    1.得到每个anchor的(左上,右下)坐标

    因为 groundtruth_bboxes 的坐标表示为(y1,x1,y2,x2),所以这一步将 anchor 的坐标也转换成这种形式。

    yref, xref, href, wref = anchors_layer
    
    ymin = yref - href / 2.
    xmin = xref - wref / 2.
    ymax = yref + href / 2.
    xmax = xref + wref / 2.
    
    2.得到每个anchor的分类和偏移

    计算 groundtruth_bboxes 与 每个anchors 的 IOU,作为 scores。每个 anchor 取与其IOU最大的groundtruth_bbox作为基准来计算偏移。

        vol_anchors = (xmax - xmin) * (ymax - ymin) # 计算每个anchor的面积
    
        # shape  = (feat_size,feat_size,num_anchors)
        shape = (yref.shape[0], yref.shape[1], href.size)
        #全部初始化为0
        feat_labels = tf.zeros(shape, dtype=tf.int64)
        feat_scores = tf.zeros(shape, dtype=dtype)
    
        feat_ymin = tf.zeros(shape, dtype=dtype)
        feat_xmin = tf.zeros(shape, dtype=dtype)
        feat_ymax = tf.ones(shape, dtype=dtype)
        feat_xmax = tf.ones(shape, dtype=dtype)
    
        # 计算所有 anchors 和某一个 groundtruth_bbox 的IOU
        def jaccard_with_anchors(bbox):
            """Compute jaccard score between a box and the anchors.
            """
            int_ymin = tf.maximum(ymin, bbox[0])
            int_xmin = tf.maximum(xmin, bbox[1])
            int_ymax = tf.minimum(ymax, bbox[2])
            int_xmax = tf.minimum(xmax, bbox[3])
            h = tf.maximum(int_ymax - int_ymin, 0.)
            w = tf.maximum(int_xmax - int_xmin, 0.)
            # Volumes.
            inter_vol = h * w
            union_vol = vol_anchors - inter_vol \
                + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])
            jaccard = tf.div(inter_vol, union_vol)
            return jaccard
    
        def intersection_with_anchors(bbox):
            """Compute intersection between score a box and the anchors.
            """
            int_ymin = tf.maximum(ymin, bbox[0])
            int_xmin = tf.maximum(xmin, bbox[1])
            int_ymax = tf.minimum(ymax, bbox[2])
            int_xmax = tf.minimum(xmax, bbox[3])
            h = tf.maximum(int_ymax - int_ymin, 0.)
            w = tf.maximum(int_xmax - int_xmin, 0.)
            inter_vol = h * w
            scores = tf.div(inter_vol, vol_anchors)
            return scores
    
        def condition(i, feat_labels, feat_scores,
                      feat_ymin, feat_xmin, feat_ymax, feat_xmax):
            """Condition: check label index.
            """
            r = tf.less(i, tf.shape(labels))
            return r[0]
    
        def body(i, feat_labels, feat_scores,
                 feat_ymin, feat_xmin, feat_ymax, feat_xmax):
            """Body: update feature labels, scores and bboxes.
            Follow the original SSD paper for that purpose:
              - assign values when jaccard > 0.5;
              - only update if beat the score of other bboxes.
            """
            # Jaccard score.
            label = labels[i]
            bbox = bboxes[i]
            jaccard = jaccard_with_anchors(bbox)
            # Mask: check threshold + scores + no annotations + num_classes.
            mask = tf.greater(jaccard, feat_scores)
            # mask = tf.logical_and(mask, tf.greater(jaccard, ignore_threshold))
            mask = tf.logical_and(mask, feat_scores > -0.5)
            mask = tf.logical_and(mask, label < num_classes)
            imask = tf.cast(mask, tf.int64)
            fmask = tf.cast(mask, dtype)
            # Update values using mask.
            feat_labels = imask * label + (1 - imask) * feat_labels
            feat_scores = tf.where(mask, jaccard, feat_scores)
    
            feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
            feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
            feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
            feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax
    
            # Check no annotation label: ignore these anchors...
            # interscts = intersection_with_anchors(bbox)
            # mask = tf.logical_and(interscts > ignore_threshold,
            #                       label == no_annotation_label)
            # # Replace scores by -1.
            # feat_scores = tf.where(mask, -tf.cast(mask, dtype), feat_scores)
    
            return [i+1, feat_labels, feat_scores,
                    feat_ymin, feat_xmin, feat_ymax, feat_xmax]
        # Main loop definition.
        i = 0
        [i, feat_labels, feat_scores,
         feat_ymin, feat_xmin,
         feat_ymax, feat_xmax] = tf.while_loop(condition, body,
                                               [i, feat_labels, feat_scores,
                                                feat_ymin, feat_xmin,
                                                feat_ymax, feat_xmax])
    
    3.将坐标形式转换成偏移量的形式

    与网络的输出 pred_locs 计算损失,坐标形式为(dx,dy,dw,dh)。

        # Transform to center / size.
        feat_cy = (feat_ymax + feat_ymin) / 2.
        feat_cx = (feat_xmax + feat_xmin) / 2.
        feat_h = feat_ymax - feat_ymin
        feat_w = feat_xmax - feat_xmin
        # Encode features.
        feat_cy = (feat_cy - yref) / href / prior_scaling[0]
        feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
        feat_h = tf.log(feat_h / href) / prior_scaling[2]
        feat_w = tf.log(feat_w / wref) / prior_scaling[3]
        # Use SSD ordering: x / y / w / h instead of ours.
        feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)
    

    encode 过程完整代码

    def tf_ssd_bboxes_encode_layer(labels,
                                   bboxes,
                                   anchors_layer,
                                   num_classes,
                                   no_annotation_label,
                                   ignore_threshold=0.5,
                                   prior_scaling=[0.1, 0.1, 0.2, 0.2],
                                   dtype=tf.float32):
        """Encode groundtruth labels and bounding boxes using SSD anchors from
        one layer.
    
        Arguments:
          labels: 1D Tensor(int64) containing groundtruth labels;
          bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
          anchors_layer: Numpy arrwiay th layer anchors;
          matching_threshold: Threshold for positive match with groundtruth bboxes;
          prior_scaling: Scaling of encoded coordinates.
    
        Return:
          (target_labels, target_localizations, target_scores): Target Tensors.
        """
        # Anchors coordinates and volume.
        yref, xref, href, wref = anchors_layer
        ymin = yref - href / 2.
        xmin = xref - wref / 2.
        ymax = yref + href / 2.
        xmax = xref + wref / 2.
    
        vol_anchors = (xmax - xmin) * (ymax - ymin)
    
        # Initialize tensors...
        shape = (yref.shape[0], yref.shape[1], href.size)
        feat_labels = tf.zeros(shape, dtype=tf.int64)
        feat_scores = tf.zeros(shape, dtype=dtype)
    
        feat_ymin = tf.zeros(shape, dtype=dtype)
        feat_xmin = tf.zeros(shape, dtype=dtype)
        feat_ymax = tf.ones(shape, dtype=dtype)
        feat_xmax = tf.ones(shape, dtype=dtype)
    
        def jaccard_with_anchors(bbox):
            """Compute jaccard score between a box and the anchors.
            """
            int_ymin = tf.maximum(ymin, bbox[0])
            int_xmin = tf.maximum(xmin, bbox[1])
            int_ymax = tf.minimum(ymax, bbox[2])
            int_xmax = tf.minimum(xmax, bbox[3])
            h = tf.maximum(int_ymax - int_ymin, 0.)
            w = tf.maximum(int_xmax - int_xmin, 0.)
            # Volumes.
            inter_vol = h * w
            union_vol = vol_anchors - inter_vol \
                + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])
            jaccard = tf.div(inter_vol, union_vol)
            return jaccard
    
        def intersection_with_anchors(bbox):
            """Compute intersection between score a box and the anchors.
            """
            int_ymin = tf.maximum(ymin, bbox[0])
            int_xmin = tf.maximum(xmin, bbox[1])
            int_ymax = tf.minimum(ymax, bbox[2])
            int_xmax = tf.minimum(xmax, bbox[3])
            h = tf.maximum(int_ymax - int_ymin, 0.)
            w = tf.maximum(int_xmax - int_xmin, 0.)
            inter_vol = h * w
            scores = tf.div(inter_vol, vol_anchors)
            return scores
    
        def condition(i, feat_labels, feat_scores,
                      feat_ymin, feat_xmin, feat_ymax, feat_xmax):
            """Condition: check label index.
            """
            r = tf.less(i, tf.shape(labels))
            return r[0]
    
        def body(i, feat_labels, feat_scores,
                 feat_ymin, feat_xmin, feat_ymax, feat_xmax):
            """Body: update feature labels, scores and bboxes.
            Follow the original SSD paper for that purpose:
              - assign values when jaccard > 0.5;
              - only update if beat the score of other bboxes.
            """
            # Jaccard score.
            label = labels[i]
            bbox = bboxes[i]
            jaccard = jaccard_with_anchors(bbox)
            # Mask: check threshold + scores + no annotations + num_classes.
            mask = tf.greater(jaccard, feat_scores)
            # mask = tf.logical_and(mask, tf.greater(jaccard, ignore_threshold))
            mask = tf.logical_and(mask, feat_scores > -0.5)
            mask = tf.logical_and(mask, label < num_classes)
            imask = tf.cast(mask, tf.int64)
            fmask = tf.cast(mask, dtype)
            # Update values using mask.
            feat_labels = imask * label + (1 - imask) * feat_labels
            feat_scores = tf.where(mask, jaccard, feat_scores)
    
            feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
            feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
            feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
            feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax
    
            # Check no annotation label: ignore these anchors...
            # interscts = intersection_with_anchors(bbox)
            # mask = tf.logical_and(interscts > ignore_threshold,
            #                       label == no_annotation_label)
            # # Replace scores by -1.
            # feat_scores = tf.where(mask, -tf.cast(mask, dtype), feat_scores)
    
            return [i+1, feat_labels, feat_scores,
                    feat_ymin, feat_xmin, feat_ymax, feat_xmax]
        # Main loop definition.
        i = 0
        [i, feat_labels, feat_scores,
         feat_ymin, feat_xmin,
         feat_ymax, feat_xmax] = tf.while_loop(condition, body,
                                               [i, feat_labels, feat_scores,
                                                feat_ymin, feat_xmin,
                                                feat_ymax, feat_xmax])
        # Transform to center / size.
        feat_cy = (feat_ymax + feat_ymin) / 2.
        feat_cx = (feat_xmax + feat_xmin) / 2.
        feat_h = feat_ymax - feat_ymin
        feat_w = feat_xmax - feat_xmin
        # Encode features.
        feat_cy = (feat_cy - yref) / href / prior_scaling[0]
        feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
        feat_h = tf.log(feat_h / href) / prior_scaling[2]
        feat_w = tf.log(feat_w / wref) / prior_scaling[3]
        # Use SSD ordering: x / y / w / h instead of ours.
        feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)
        return feat_labels, feat_localizations, feat_scores
    

    一、decode 过程

    已知 anchors (坐标形式:x,y,w,h)
    已知 网络输出 pred_locs (坐标形式:dx,dy,dw,dh)

    目标是将 网络输出 pred_locs 和 anchors 结合起来,得到真实值。
    (同样注意:anchors 的位置始终不变,因此网络输出 pred_locs 和 anchors 的结合就是真实值了)

    1.得到每个anchor的坐标
    yref, xref, href, wref = anchor_bboxes
    
    2.计算每个anchor的中心点和宽高
        # Compute center, height and width
        cx = feat_localizations[:, :, 0] * wref * prior_scaling[0] + xref
        cy = feat_localizations[:, :, 1] * href * prior_scaling[1] + yref
        w = wref * np.exp(feat_localizations[:, :, 2] * prior_scaling[2])
        h = href * np.exp(feat_localizations[:, :, 3] * prior_scaling[3])
    
    3.转换成真实值的坐标形式
        # bboxes: ymin, xmin, xmax, ymax.
        bboxes = np.zeros_like(feat_localizations)
        bboxes[:, :, 0] = cy - h / 2.
        bboxes[:, :, 1] = cx - w / 2.
        bboxes[:, :, 2] = cy + h / 2.
        bboxes[:, :, 3] = cx + w / 2.
    
    

    decode 过程完整代码

    def ssd_bboxes_decode(feat_localizations,
                          anchor_bboxes,
                          prior_scaling=[0.1, 0.1, 0.2, 0.2]):
        """Compute the relative bounding boxes from the layer features and
        reference anchor bounding boxes.
    
        Return:
          numpy array Nx4: ymin, xmin, ymax, xmax
        """
        # Reshape for easier broadcasting.
        l_shape = feat_localizations.shape
        feat_localizations = np.reshape(feat_localizations,
                                        (-1, l_shape[-2], l_shape[-1]))
        yref, xref, href, wref = anchor_bboxes
        xref = np.reshape(xref, [-1, 1])
        yref = np.reshape(yref, [-1, 1])
    
        # Compute center, height and width
        cx = feat_localizations[:, :, 0] * wref * prior_scaling[0] + xref
        cy = feat_localizations[:, :, 1] * href * prior_scaling[1] + yref
        w = wref * np.exp(feat_localizations[:, :, 2] * prior_scaling[2])
        h = href * np.exp(feat_localizations[:, :, 3] * prior_scaling[3])
        # bboxes: ymin, xmin, xmax, ymax.
        bboxes = np.zeros_like(feat_localizations)
        bboxes[:, :, 0] = cy - h / 2.
        bboxes[:, :, 1] = cx - w / 2.
        bboxes[:, :, 2] = cy + h / 2.
        bboxes[:, :, 3] = cx + w / 2.
        # Back to original shape.
        bboxes = np.reshape(bboxes, l_shape)
        return bboxes
    

    相关文章

      网友评论

          本文标题:SSD 检测算法(encode/decode过程)

          本文链接:https://www.haomeiwen.com/subject/rwprjctx.html