精确率,又称查准率(Precision,P):
![](https://img.haomeiwen.com/i1713353/d17358751f1ff3f7.png)
召回率,又称查全率(Recall,R):
![](https://img.haomeiwen.com/i1713353/236ebfd679c724aa.png)
F1值:
![](https://img.haomeiwen.com/i1713353/3192082054307b3e.png)
二分类时
当标签只有两类时
import numpy as np
from sklearn.metrics import precision_score, recall_score, f1_score
real = np.random.randint(0,2, size=10) # array([1, 0, 0, 1, 1, 0, 0, 1, 1, 1])
pred = np.random.randint(0,2, size=10) # array([0, 0, 0, 0, 1, 1, 0, 1, 1, 1])
# 直接计算
p = sum(pred*real)/sum(pred) # 0.8
r = sum(pred*real)/sum(real) # 0.67
# 用skearn计算
p = precision_score(true, pred) # 0.8
r = recall_score(true, pred) # 0.67
多分类时
当问题属于多分类问题时,要综合考察在不同类别下分类器的优劣,这时候就需要引入宏平均(Macro-averaging)、微平均(Micro-averaging),下边以3分类为例
Macro-averaging
宏平均(Macro-averaging)是指所有类别的每一个统计指标值的算数平均值,也就是宏精确率(Macro-Precision),宏召回率(Macro-Recall),宏F值(Macro-F Score),其计算公式如下:
![](https://img.haomeiwen.com/i1713353/4abbdc99736d5bbe.png)
Micro-averaging
微平均(Micro-averaging)是对数据集中的每一个示例不分类别进行统计建立全局混淆矩阵,然后计算相应的指标。其计算公式如下:
![](https://img.haomeiwen.com/i1713353/99cb20ff506a38c6.png)
Macro-averaging与Micro-averaging的不同之处在于:Macro-averaging赋予每个类相同的权重,然而Micro-averaging赋予每个样本决策相同的权重。因为从F1值的计算公式可以看出,它忽略了那些被分类器正确判定为负类的那些样本,它的大小主要由被分类器正确判定为正类的那些样本决定的,在微平均评估指标中,样本数多的类别主导着样本数少的类。
下边通过一个实际的三分类数据详细计算下:
假设有10个样本,它们属于A、B、C三个类别。假设这10个样本的真实类别和预测的类别分别是:
真实:A A A C B C A B B C
预测:A A C B A C A C B C
对于类别A来说:
![](https://img.haomeiwen.com/i1713353/00954e81be0b30bd.png)
对于类别B来说:
![](https://img.haomeiwen.com/i1713353/6ad3f05118bc447b.png)
对于类别C来说:
![](https://img.haomeiwen.com/i1713353/ca49234613cc9c44.png)
![](https://img.haomeiwen.com/i1713353/d1ef20b9fd40d901.png)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, fbeta_score
y_true = [0, 0, 0, 2, 1, 2, 0, 1, 1, 2]
y_pred = [0, 0, 2, 1, 0, 2, 0, 2, 1, 2]
accuracy_score(y_true, y_pred) # Return the number of correctly classified samples
accuracy_score(y_true, y_pred, normalize=False) # Return the fraction of correctly classified samples
# Calculate precision score
precision_score(y_true, y_pred, average='macro')
precision_score(y_true, y_pred, average='micro')
precision_score(y_true, y_pred, average=None)
# Calculate recall score
recall_score(y_true, y_pred, average='macro')
recall_score(y_true, y_pred, average='micro')
recall_score(y_true, y_pred, average=None)
# Calculate f1 score
f1_score(y_true, y_pred, average='macro')
f1_score(y_true, y_pred, average='micro')
f1_score(y_true, y_pred, average=None)
# Calculate f beta score
fbeta_score(y_true, y_pred, average='macro', beta=0.5)
fbeta_score(y_true, y_pred, average='micro', beta=0.5)
fbeta_score(y_true, y_pred, average=None, beta=0.5)
总结
如果各个类的分布不均衡的话,使用micro F1score比macro F1score 比较好,显然macro F1score没有考虑各个类的数量大小
参考
http://www.cnblogs.com/robert-dlut/p/5276927.html
https://zhuanlan.zhihu.com/p/30953081
网友评论