PointRend实现细节

作者: Cat丹 | 来源:发表于2020-03-18 16:01 被阅读0次

    如何选择采样点?

    1. 从均匀分布随机采样kN个点
    2. 重点采样BN个点
    3. 从均匀分布中采样(1-B)N个点
    def get_uncertain_point_coords_with_randomness(
        coarse_logits, uncertainty_func, num_points, oversample_ratio, importance_sample_ratio
    ):
        """
        Sample points in [0, 1] x [0, 1] coordinate space based on their uncertainty. The unceratinties
            are calculated for each point using 'uncertainty_func' function that takes point's logit
            prediction as input.
        See PointRend paper for details.
    
        Args:
            coarse_logits (Tensor): A tensor of shape (N, C, Hmask, Wmask) or (N, 1, Hmask, Wmask) for
                class-specific or class-agnostic prediction.
            uncertainty_func: A function that takes a Tensor of shape (N, C, P) or (N, 1, P) that
                contains logit predictions for P points and returns their uncertainties as a Tensor of
                shape (N, 1, P).
            num_points (int): The number of points P to sample.
            oversample_ratio (int): Oversampling parameter.
            importance_sample_ratio (float): Ratio of points that are sampled via importnace sampling.
    
        Returns:
            point_coords (Tensor): A tensor of shape (N, P, 2) that contains the coordinates of P
                sampled points.
        """
         assert oversample_ratio >= 1
        assert importance_sample_ratio <= 1 and importance_sample_ratio >= 0
        num_boxes = coarse_logits.shape[0]
        num_sampled = int(num_points * oversample_ratio)
        point_coords = torch.rand(num_boxes, num_sampled, 2, device=coarse_logits.device)
        point_logits = point_sample(coarse_logits, point_coords, align_corners=False)
        # It is crucial to calculate uncertainty based on the sampled prediction value for the points.
        # Calculating uncertainties of the coarse predictions first and sampling them for points leads
        # to incorrect results.
        # To illustrate this: assume uncertainty_func(logits)=-abs(logits), a sampled point between
        # two coarse predictions with -1 and 1 logits has 0 logits, and therefore 0 uncertainty value.
        # However, if we calculate uncertainties for the coarse predictions first,
        # both will have -1 uncertainty, and the sampled point will get -1 uncertainty.
        point_uncertainties = uncertainty_func(point_logits)
        num_uncertain_points = int(importance_sample_ratio * num_points)
        num_random_points = num_points - num_uncertain_points
        idx = torch.topk(point_uncertainties[:, 0, :], k=num_uncertain_points, dim=1)[1]
        shift = num_sampled * torch.arange(num_boxes, dtype=torch.long, device=coarse_logits.device)
        idx += shift[:, None]
        point_coords = point_coords.view(-1, 2)[idx.view(-1), :].view(
            num_boxes, num_uncertain_points, 2
        )
        if num_random_points > 0:
            point_coords = cat(
                [
                    point_coords,
                    torch.rand(num_boxes, num_random_points, 2, device=coarse_logits.device),
                ],
                dim=1,
            )
        return point_coords
    

    uncertainty_map?

    如何获取采样特征?

    将采样点坐标投射到特征图上,直接get相应位置的value(通过双线性插值实现)

    def point_sample_fine_grained_features(features_list, feature_scales, boxes, point_coords):
        """
        Get features from feature maps in `features_list` that correspond to specific point coordinates
            inside each bounding box from `boxes`.
    
        Args:
            features_list (list[Tensor]): A list of feature map tensors to get features from.
            feature_scales (list[float]): A list of scales for tensors in `features_list`.
            boxes (list[Boxes]): A list of I Boxes  objects that contain R_1 + ... + R_I = R boxes all
                together.
            point_coords (Tensor): A tensor of shape (R, P, 2) that contains
                [0, 1] x [0, 1] box-normalized coordinates of the P sampled points.
    
        Returns:
            point_features (Tensor): A tensor of shape (R, C, P) that contains features sampled
                from all features maps in feature_list for P sampled points for all R boxes in `boxes`.
            point_coords_wrt_image (Tensor): A tensor of shape (R, P, 2) that contains image-level
                coordinates of P points.
        """
        cat_boxes = Boxes.cat(boxes)
        num_boxes = [len(b) for b in boxes]
    
        point_coords_wrt_image = get_point_coords_wrt_image(cat_boxes.tensor, point_coords)
        split_point_coords_wrt_image = torch.split(point_coords_wrt_image, num_boxes)
    
        point_features = []
        for idx_img, point_coords_wrt_image_per_image in enumerate(split_point_coords_wrt_image):
            point_features_per_image = []
            for idx_feature, feature_map in enumerate(features_list):
                h, w = feature_map.shape[-2:]
                scale = torch.tensor([w, h], device=feature_map.device) / feature_scales[idx_feature]
                point_coords_scaled = point_coords_wrt_image_per_image / scale
                point_features_per_image.append(
                    point_sample(
                        feature_map[idx_img].unsqueeze(0),
                        point_coords_scaled.unsqueeze(0),
                        align_corners=False,
                    )
                    .squeeze(0)
                    .transpose(1, 0)
                )
            point_features.append(cat(point_features_per_image, dim=1))
    
        return cat(point_features, dim=0), point_coords_wrt_image
    

    如何计算loss?

    从prediction和gt分别提取P个点对应的值,只计算这P个点处的loss

    def roi_mask_point_loss(mask_logits, instances, points_coord):
        """
        Compute the point-based loss for instance segmentation mask predictions.
    
        Args:
            mask_logits (Tensor): A tensor of shape (R, C, P) or (R, 1, P) for class-specific or
                class-agnostic, where R is the total number of predicted masks in all images, C is the
                number of foreground classes, and P is the number of points sampled for each mask.
                The values are logits.
            instances (list[Instances]): A list of N Instances, where N is the number of images
                in the batch. These instances are in 1:1 correspondence with the `mask_logits`. So, i_th
                elememt of the list contains R_i objects and R_1 + ... + R_N is equal to R.
                The ground-truth labels (class, box, mask, ...) associated with each instance are stored
                in fields.
            points_coords (Tensor): A tensor of shape (R, P, 2), where R is the total number of
                predicted masks and P is the number of points for each mask. The coordinates are in
                the image pixel coordinate space, i.e. [0, H] x [0, W].
        Returns:
            point_loss (Tensor): A scalar tensor containing the loss.
        """
        assert len(instances) == 0 or isinstance(
            instances[0].gt_masks, BitMasks
        ), "Point head works with GT in 'bitmask' format only. Set INPUT.MASK_FORMAT to 'bitmask'."
        with torch.no_grad():
            cls_agnostic_mask = mask_logits.size(1) == 1
            total_num_masks = mask_logits.size(0)
    
            gt_classes = []
            gt_mask_logits = []
            idx = 0
            for instances_per_image in instances:
                if not cls_agnostic_mask:
                    gt_classes_per_image = instances_per_image.gt_classes.to(dtype=torch.int64)
                    gt_classes.append(gt_classes_per_image)
    
                gt_bit_masks = instances_per_image.gt_masks.tensor
                h, w = instances_per_image.gt_masks.image_size
                scale = torch.tensor([w, h], dtype=torch.float, device=gt_bit_masks.device)
                points_coord_grid_sample_format = (
                    points_coord[idx : idx + len(instances_per_image)] / scale
                )
                idx += len(instances_per_image)
                gt_mask_logits.append(
                    point_sample(
                        gt_bit_masks.to(torch.float32).unsqueeze(1),
                        points_coord_grid_sample_format,
                        align_corners=False,
                    ).squeeze(1)
                )
            gt_mask_logits = cat(gt_mask_logits)
    
        # torch.mean (in binary_cross_entropy_with_logits) doesn't
        # accept empty tensors, so handle it separately
        if gt_mask_logits.numel() == 0:
            return mask_logits.sum() * 0
    
        if cls_agnostic_mask:
            mask_logits = mask_logits[:, 0]
        else:
            indices = torch.arange(total_num_masks)
            gt_classes = cat(gt_classes, dim=0)
            mask_logits = mask_logits[indices, gt_classes]
    
        # Log the training accuracy (using gt classes and 0.0 threshold for the logits)
        mask_accurate = (mask_logits > 0.0) == gt_mask_logits.to(dtype=torch.uint8)
        mask_accuracy = mask_accurate.nonzero().size(0) / mask_accurate.numel()
        get_event_storage().put_scalar("point_rend/accuracy", mask_accuracy)
    
        point_loss = F.binary_cross_entropy_with_logits(
            mask_logits, gt_mask_logits.to(dtype=torch.float32), reduction="mean"
        )
        return point_loss
    

    采样特征层?

    • Fine-grained features:P2(也可以是多层,通过concat连接在一起)
    • Coarse prediction features:比如预测的mask
    • P2的size?相比输入,stride=4

    网络结构?

    • mask rcnn backbone:Resnet50+FPN
    • coarse mask head
      • 输入:从P2提取的14*14的特征
      • conv(in,256,2,stride=2)
      • ReLU
      • MLP:隐藏层宽度为1024,带ReLU层
      • sigmoid激活层
    • PointRend_head:
      • 输入:coarse mask的K维特征向量(因为有K个类别),P2层的256维特征向量
      • MLP:3个隐藏层(1*1卷积层),每个256个通道(每层的输入为上层的256个输出+coarse mask的K维向量),ReLU
      • sigmoid激活层

    相关文章

      网友评论

        本文标题:PointRend实现细节

        本文链接:https://www.haomeiwen.com/subject/qvccyhtx.html