目标检测的评估指标——mAP

作者: 星光下的胖子 | 来源:发表于2021-06-07 13:15 被阅读0次

目标检测的评估指标——mAP
目标检测：mAP指标计算
利用mAP评估目标检测模型
目标检测算法的评估指标：mAP定义及计算方式
目标检测性能评估参数 — IoU & mAP
目标检测的评估函数(precision,recall,mAP)
目标检测之评论指标
目标检测常见指标
COCO物体检测评测方法简介
《绩效管理》--绩效指标与标准

1.相关概念
- 1）IoU(Intersection over Union)
- 2）Precision和Recall
2.mAP(mean Average Precision)
- 1）Pascal VOC
- 2）COCO
3.总结mAP的计算流程
4.mAP的代码实现

一、相关概念

1.IoU(Intersection over Union)

IoU的定义：预测bbox与实际bbox的交并比。
IoU被用来判断对一个对象的预测是否正确。若IoU > threshold，该预测被认为是TP；否则若IoU <= threshold，该预测被认为是FP。

2.Precision和Recall

为了更好的理解mAP，我们先了解一下Precision和Recall。

Recall(召回率/查全率)：是指在所有确实为正的样本中，被预测为正样本的占比。
Precision(精确率/查准率)：是指在所有被预测为正的样本中，确实是正样本的占比。
$Recall=\frac{TP}{TP+FN}=\frac{TP}{N_{groundtruths}} \\ Precision=\frac{TP}{TP+FP}=\frac{TP}{N_{predictions}}$

二、mAP(mean Average Precision)

目标检测问题不同于一般的分类问题，不仅要检测出目标，输出目标的类别，还要定位出目标的位置。分类问题中的Accuracy已不能作为目标检测算法的评估指标，而mAP是目标检测算法中最常用的评估指标。
1）AP(Average Precision)是PR曲线围成的面积，用来衡量对一个类检测的好坏。
2）mAP(mean Average Precision)是所有类AP的平均值，衡量多类别目标检测的好坏。

不同的数据集/竞赛可能有不同的评估指标。最常用的是PASCAL VOC和MS COCO中的评价指标。

Pascal VOC

为了计算mAP，首先需要计算每个类的AP。

步骤一：绘制PR曲线

以下面的两张图片为例，包含某一个指定类别的GT框(绿色)和预测框(红色)。

对于每张图片中的每个预测框，计算并选择与预测框的IoU最大那个GT框，统计成表格如下：

VOC中的IoU阈值为0.5，所以IoU > 0.5被视为TP，否则为FP。现在，我们根据置信度confidence从高到低进行排序。值得注意的是，若多个预测框对应同一个GT，则置信度最高的那个视为TP，其他的视为FP。如下表中P3和P4都对应GB，P4(置信度最高)被视为TP，P3被视为FP。

在VOC指标中，第k行(rank)的Recall和Precision的计算，是包含当前行及以上所有行的预测数据。首先，累计计算每一行的TP和FP；然后，根据公式计算每k行的Precision和Recall(Precision等于TP除以当前预测框总数，Recall等于TP除以所有GT框数量)。以表中第二行(rank=2)为例，TP累计为1，当前预测框总数为2，所有GT框数量为3，有Precision=1/2=0.5，Recall=1/3=0.33。

计算完每一行的Precision和Recall之后，以Precision为横坐标，Recall为纵坐标，即可得到PR曲线。

步骤二：平滑PR曲线

每个查全率级别r的Precision，通过取查全率>=r的所有Precision的最大值来进行插值替换。即保证低查全率的精度不小于更高查全率的精度。

步骤三：计算AP

VOC 2007是取11个Recall点[0,0.1,...,1]的Precision的平均值作为AP值。
$AP=\frac{1}{11}\sum_{r \in (0,0.1,...,1)}p_{interp(r)}$

VOC 2012是取所有Recall点的Precision的平均值作为AP值，即PR曲线下的面积(AUC)。

计算完每个类的AP值之后，求平均即为mAP。

COCO

在VOC中，IoU的阈值固定为0.5，这意味着，IoU分别为0.6、0.9的两个预测被认为是等价的，显然这是有偏差的。COCO中通过指定一个阈值范围[.5:.05:.95]来解决这个问题，它计算这个范围中每个阈值的mAP，然后求平均得到最终的mAP。
$mAP_{COCO}=\frac{mAP_{0.50} + mAP_{0.55} + ... + mAP_{0.95}}{10}$

另外，COCO中使用101点法(Recall范围[0:.01:1] )来计算AP。
注意：在COCO中mAP也可简写为AP。

COCO中AP的计算步骤(并非唯一)：
1）对于每个类，计算该类在不同IoU阈值的AP并取平均。
$AP[class]=\frac{1}{N_{thresholds}}\sum_{iou \in thresholds}AP[class,iou]$

2）对所有类的AP取平均，得到最终的AP。
$AP[class]=\frac{1}{N_{thresholds}}\sum_{iou \in thresholds}AP[class,iou]$

可见，COCO中的AP实际是“平均平均平均精度”。

三、总结mAP的计算流程

mAP的计算流程：

1.首先，指定一个较低的confidence阈值(通常是0.001、0.01)，来筛选网络的预测框。
- 选用低阈值是为了尽可能保留较多的框。由于不同的模型之间合理阈值是不一样的，测试mAP需要屏蔽这个不同，以实现统一标准。
2.对筛选后的预测框，进行nms(使用类内nms)非极大值抑制，去除高度重叠的框。
- nms的IoU阈值一般取0.5，你选择0.6、0.7也是可以的，这个影响不大。
3.根据经过上面处理后的预测框和真值GT框，来计算mAP：
- 3.1）先计算每个类的、指定IoU阈值的AP(以计算AP75为例，iou_threshold=0.75)。
  - 3.1.1）为每个类构建一个matched_table表。
    - 行数等于所有的预测框数量，列数为[confidence, matched_iou, matched_groundtruth_index, image_id]。
      - image_id为预测框对应的图片ID，confidence为预测框的置信度；
      - matched_GT_index为与该预测框的IoU最大的那个GT框索引(同一张图片中的预测框和GT框之间计算IoU)；
      - max_matched_iou为最大的那个IoU值(用来与iou_threshold做对比，判断是TP还是FP)。
    - 按置信度confidence从高到低对matched_table表进行排序。
  - 3.1.2）判断每个预测框是属于TP还是FP。
    - 当matched_iou <= iou_threshold时，都视为FP(mAP@[IoU=0.5]的iou_threshold=0.5)。
    - 当matched_iou > iou_threshold时，即预测框匹配某个GT时：
      - 如果该GT第一次被匹配，则当前预测框(置信度最高)被视为TP，否则被视为FP。
      - (注意：一个GT框最多只能对应一个预测框，出现多个预测框匹配同一个GT的情况时，将置信度最高的那个视为TP。)
  - 3.1.3）累计每行(rank)总TP数，并计算每行的Precision和Recall。
    - 每一行的Recall和Precision的计算公式：
      - $Precision=\frac{TP}{TP+FP}=\frac{TP}{N_{predictions}}，Recall=\frac{TP}{TP+FN}=\frac{TP}{N_{groundtruths}}$
      - TP是指当前行及以上所有TP总数， $N_{predictions}$ 是当前行及以上所有预测框数量， $N_{groundtruths}$ 是所有的GT框数量。
  - 3.1.4）计算完每行的Precision和Recall后，将其绘制成PR曲线，即可计算AP。
    - 首先，对PR曲线进行平滑处理。
      - 使得低Recall的Precision不低于比它更高的Recall的Precision。
    - 然后，计算Recall对应的平均精度(AP)。有几种计算方式：
      - VOC 2007：11点法，即取Recall[0:0.1:1]的11个点的平均Precision作为AP。
      - VOC 2012：取所有点的平均Precision作为AP，即PR曲线下的面积。
      - COCO：101点法，Recall[0:0.01:1]的101点的平均就Precision作为AP。
- 3.2）对所有类别的AP值求平均，即得到mAP。
  - 可通过调整IoU阈值，分别得到AP50、AP75和AP@[IoU=0.5:0.95]。

四、mAP的代码实现

手动实现计算mAP的代码，并与调用pycocotools库计算mAP做对比，两者结果一致。

1.手动实现计算mAP的代码

实现代码：

# 计算IoU(多对多)
def ious(a, b):
    '''
    a : 4 x M x 1    left, top, right, bottom
    b : 4 x 1 x N    left, top, right, bottom
    '''
    aleft, atop, aright, abottom = [a[i] for i in range(4)]
    bleft, btop, bright, bbottom = [b[i] for i in range(4)]
    
    # aleft.shape = M, 1
    # bleft.shape = 1, N
    cross_left = np.maximum(aleft, bleft)        # M x N
    cross_top = np.maximum(atop, btop)           # M x N
    cross_right = np.minimum(aright, bright)     # M x N
    cross_bottom = np.minimum(abottom, bbottom)  # M x N
    
    # cross_area.shape  =  M x N
    cross_area = (cross_right - cross_left + 1).clip(0) * (cross_bottom - cross_top + 1).clip(0)
    # union_area.shape  =  M x N
    union_area = (aright - aleft + 1) * (abottom - atop + 1) + (bright - bleft + 1) * (bbottom - btop + 1) - cross_area
    # M x N
    return cross_area / union_area

# 构建指定类的matched_table
def build_matched_table(classes_index, groundtruths, detections, maxDets=100):
    '''
    classes_index: 需要构建matched_table的类索引
    groundtruths: GT框，形如{"image_id": [[xmin, ymin, xmax, ymax, 0, class_index], ...], ...}
    detections: 预测框形如{"image_id": [[xmin, ymin, xmax, ymax, confidence, class_index], ...], ...}
    maxDets: 每张图片的最大预测框数量，默认为100
    '''
    matched_table = []  # 构建的matched_table表
    sum_groundtruths = 0  # 统计GT框的数量
    # 遍历每张图片
    for image_id in groundtruths:
        # 选择"当前类"的预测框和GT框，并转换为numpy类型
        # [x1,y1,x2,y2,conf,class_index]
        select_detections = np.array(list(filter(lambda x: x[5] == classes_index, detections[image_id])))    
        select_groundtruths = np.array(list(filter(lambda x: x[5] == classes_index, groundtruths[image_id])))
        num_detections = len(select_detections)
        num_groundtruths = len(select_groundtruths)

        # 有用的预测框
        num_use_detections = min(num_detections, maxDets)
        # 统计GT框数量
        sum_groundtruths += num_groundtruths

        # 当前图片的预测框数量为0，直接返回
        if num_detections == 0:
            continue

        # 当图片的GT框数量为0时，选择不超过数量上限的预测框(任意选择，不影响，都是FP)，matched_iou置为0
        if len(select_groundtruths) == 0:
            for detection_index in range(num_use_detections):
                confidence = select_detections[detection_index, 4]
                matched_table.append([confidence, 0, -1, image_id])
            continue

        # reshape，以便可以广播，同时计算多个iou
        sgt = select_groundtruths.T.reshape(6, -1, 1)
        sdt = select_detections.T.reshape(6, 1, -1)

        # 计算所有GT与所有预测框的IoU
        groundtruth_detection_ious = ious(sgt, sdt)
        # 构建matched_table表
        for detection_index in range(num_use_detections):
            confidence = select_detections[detection_index, 4]
            matched_groundtruth_index = groundtruth_detection_ious[:, detection_index].argmax()
            matched_iou = groundtruth_detection_ious[matched_groundtruth_index, detection_index]
            matched_table.append([confidence, matched_iou, matched_groundtruth_index, image_id])

    # 按置信度confidence从高到低进行排序
    matched_table = sorted(matched_table, key=lambda x: x[0], reverse=True)
    return matched_table, sum_groundtruths

# 计算单个类的、指定iou_threshold的AP
def compute_AP(matched_table, iou_threshold, sum_groundtruths):
    '''
    matched_table: 形如[[confidence, matched_iou, matched_groundtruth_index, image_id], ...]
    '''
    # 1.判断每个预测框属于TP还是FP。
    num_detections = len(matched_table)  # 预测框总数量
    true_positive = np.zeros((num_detections,))  # 每一个预测框的TP/FP表示(0为FP，1为TP)
    # 构建一个groundtruth_seen_map字典，标记某个GT是否已经被预测。
    # item[3]是image_id，以image_id为key，value初始为一个空的set()集合。
    groundtruth_seen_map = {item[3]:set() for item in matched_table}
    # 注意：matched_table是按置信度从大到小进行排序后的。
    # 从上到下遍历每个预测框，判断属于TP还是FP：
    # 1）当matched_iou <= iou_threshold时，都视为FP。
    # 2）当matched_iou > iou_threshold时，即预测框匹配某个GT时：
    #   2.1)如果该GT第一次被匹配(即不在image_id对应的set中)，则将GT添加到set中，且当前预测框视为TP。
    #   2.2)如果该GT已经被预测了(即已经在image_id对应的set中了)，则将当前预测视为FP。
    for index in range(num_detections):
        # [confidence, matched_iou, matched_groundtruth_index, image_id]
        confidence, matched_iou, matched_groundtruth_index, image_id = matched_table[index]

        # 只有满足matched_iou > iou_threshold且是第一次匹配某个GT时，才认为是TP
        image_seen_map = groundtruth_seen_map[image_id]  # 获取指定图片的seen_map
        if matched_iou > iou_threshold and matched_groundtruth_index not in image_seen_map:
            true_positive[index] = 1  # 判断为TP
            image_seen_map.add(matched_groundtruth_index)  # 添加当前GT到seen_map中
                
    # 2.累加每行的TP，并计算Precision和Recall。
    TP_count = np.cumsum(true_positive)  # 累计每行的TP
    detection_count = np.arange(1, num_detections + 1)  # 累计每行的预测框总数
    precision = TP_count / detection_count  # 计算Precision
    recall = TP_count / sum_groundtruths  # 计算Recall
    
    # 3.平滑PR曲线
    mrec = np.concatenate(([0.], recall, [min(recall[-1] + 1E-3, 1.)]))  # 首尾添加两个点
    mpre = np.concatenate(([0.], precision, [0.]))  # 首尾添加两个点
    # 使得低Recall的Precision不低于比它更高的Recall的Precision。
    mpre = np.flip(np.maximum.accumulate(np.flip(mpre)))
    
    # 4.计算AP：插值计算101点的平均精度(COCO的计算方法)
    AP = np.mean(np.interp(np.linspace(0, 1, 101), mrec, mpre))
    return AP

# 计算所有类的mAP
def compute_mAP(groundtruths, detections, classes, maxDets=100):
    '''
    groundtruths: 形如{"image_id": [[xmin, ymin, xmax, ymax, 0, class_index], [xmin, ymin, xmax, ymax, 0, class_index]], ...}
    detections: 形如{"image_id": [[xmin, ymin, xmax, ymax, confidence, class_index], [xmin, ymin, xmax, ymax, confidence, class_index]], ...}
    classes: 所有类别，形如["aeroplane", "bicycle", "bird", "boat", "bottle", ...]
    maxDets: 每张图片的最大预测框数量，默认为100
    '''
    APs = []
    # 遍历每个类，计算每个类的[AP@[IoU=0.5], AP@[IoU=0.75], AP@[IoU=0.5:0.95]]
    for classes_index in range(len(classes)):
        # 1.构建指定类的matched_table
        matched_table, sum_groundtruths = build_matched_table(classes_index, groundtruths, detections, maxDets)
        # 2.根据matched_table计算AP
        AP50 = compute_AP(matched_table, 0.5, sum_groundtruths)
        AP75 = compute_AP(matched_table, 0.75, sum_groundtruths)
        AP = np.mean([compute_AP(matched_table, iou_threshold, sum_groundtruths) for iou_threshold in np.arange(0.5, 1.0, 0.05)])
        APs.append([AP, AP50, AP75])
        
    # 计算mAP(所有类的AP的平均值)
    return np.mean(APs, axis=0)

预测框detections是经过类内nms处理后的，预测框和GT框的格式形如：
{"image_id": [[xmin, ymin, xmax, ymax, 0, class_index], ...], ...}
计算mAP结果如下：

2.调用`pycocotools`库计算mAP

安装pycocotools命令：pip install pycocotools

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval

def mapCOCO(groundtruth_annotation, detection_annotation, classes):
    images = []
    annotations = []
    categories = []
    ann_id = 0
    for class_index, class_name in enumerate(classes):
        categories.append({"supercategory": class_name, "id": class_index, "name": class_name})

    for item in groundtruth_annotation:
        filename = item
        anns = groundtruth_annotation[item]
        image_id = int(filename)
        images.append({"id": image_id})

        for left, top, right, bottom, score, class_index in anns:
            ann_id += 1
            width, height = right - left + 1, bottom - top + 1
            annotations.append({"image_id": image_id, "id": ann_id, "category_id": class_index, "bbox": [left, top, width, height], "iscrowd": 0, "area": width * height})

    gt_coco = {"images": images, "annotations": annotations, "categories": categories}
    with open("gt_coco.json", "w") as f:
        json.dump(gt_coco, f)

    cocoGt = COCO("gt_coco.json")
    ann_dets = []
    for item in detection_annotation:
        anns = detection_annotation[item]
        image_id = int(item)  
        for left, top, right, bottom, score, classes in anns:
            # {"image_id":1,"category_id":2,"bbox":[199.84, 190.46, 77.71, 70.88],"score":0.236},
            width = right - left + 1
            height = bottom - top + 1
            object_item = {"image_id": image_id, "category_id": classes, "score": score, "bbox": [left, top, width, height]}
            ann_dets.append(object_item)

    cocoDt = cocoGt.loadRes(ann_dets)
    cocoEval = COCOeval(cocoGt, cocoDt, "bbox")
    cocoEval.evaluate()
    cocoEval.accumulate()
    cocoEval.summarize()

计算mAP结果如下：

目标检测的评估指标——mAP
目录： 1.相关概念1）IoU(Intersection over Union)2）Precision和Recal...
目标检测：mAP指标计算
mAP(mean average precision)是目标检测算法中衡量算法识别精度的指标，在计算mAP之前，需...
利用mAP评估目标检测模型
在本文[https://www.kdnuggets.com/2021/03/evaluating-object-d...
目标检测算法的评估指标：mAP定义及计算方式
前面依次介绍了： 1，《从零开始在Windows10中编译安装YOLOv3》 2，《在Pascal VOC 数据集...
目标检测性能评估参数 — IoU & mAP
目标检测：给定一个图像，找到其中的目标，确定其位置，并对目标进行分类。 1. IoU 交并比（Intersecti...
目标检测的评估函数(precision,recall,mAP)
对于二分类的情况，我们可以使用以下公式计算其精确度(precision)和召回率(recall)。例子对于一幅...
目标检测之评论指标
目标检测指标主要有两项一个是AP一个就是mAP。一、 AP（Average Precision） 1、我们就拿f...
目标检测常见指标
GTbox：人工手动选择的框 TP(True Positive): 预测框与gtbox的IoU>0.5FP(Fal...
COCO物体检测评测方法简介
本文从ap计算到map计算，最后到coco[0.5:0.95:0.05] map的计算，一步一步拆解物体检测指标m...
《绩效管理》--绩效指标与标准
1、什么是绩效评估指标？绩效评估指标有哪些分类？评估指标指的是评估因子或评估项目。根据评估内容分：1）工作业绩...