AUC和ROC

作者: Hayley笔记 | 来源:发表于2021-05-15 18:55 被阅读0次

AUC :曲线下面积(Area Under the Curve)

AUROC :接受者操作特征曲线下面积(Area Under the Receiver Operating Characteristic curve)

1. ROC曲线概述

ROC曲线是一种评价分类模型的可视化工具。ROC的图形是横纵坐标限定在0-1范围内的曲线,横坐标是假正率FPR(错误的判断为正确的概率),纵坐标是真正率TPR(正确的判断为正确的概率)。通常我们认为,曲线的凸起程度越高,模型性能越好,而曲线越接近于对角线,模型的准确性越低。

2. AUC

AUC表示ROC曲线下方的面积,是对ROC曲线的量化。由于ROC曲线的横纵坐标都是0-1,因此AUC是1x1方格中的一部分,其大小在0-1之间。

3. ROC曲线的绘制

3.1 基础概念
  • 预测概率和阈值:
    分类模型的输出结果中包含一个0-1的概率值,该概率值代表着对应的样本被预测为某类别的可能性。然后再通过阈值来进行划分,概率大于阈值的被判断为正,概率小于阈值的被判断为负。
  • TPR和FPR:ROC曲线的横坐标为FPR,纵坐标为TPR,FPR是错误的预测为正的概率,TPR是错误的预测为正的概率。
3.2 ROC曲线绘制步骤
  1. 将全部样本按概率递减排序
  2. 阈值从1至0变更,计算各阈值下对应的(FPR,TPR)数值对。
  3. 将数值对绘于直角坐标系中。

4. ROC and AUC in R

# install.packages("pROC")
# install.packages("randomForest")
library(pROC) 
library(randomForest) #Random Forest is a way to classify samples and we can change the threshold that we use to make those decisions.
set.seed(420) # this will make my results match yours
num.samples <- 100
weight <- sort(rnorm(n=num.samples, mean=172, sd=29))
obese <- ifelse(test=(runif(n=num.samples) < (rank(weight)/num.samples)), 
                yes=1, no=0)
obese
plot(x=weight, y=obese)
## fit a logistic regression to the data...
glm.fit=glm(obese ~ weight, family=binomial)
lines(weight, glm.fit$fitted.values)

draw ROC and AUC using pROC

#######################################
##
## draw ROC and AUC using pROC
##
#######################################
## NOTE: By default, the graphs come out looking terrible
## The problem is that ROC graphs should be square, since the x and y axes
## both go from 0 to 1. However, the window in which I draw them isn't square
## so extra whitespace is added to pad the sides.
roc(obese, glm.fit$fitted.values, plot=TRUE)
## Now let's configure R so that it prints the graph as a square.
##
par(pty = "s") ## pty sets the aspect ratio of the plot region. Two options:
##                "s" - creates a square plotting region
##                "m" - (the default) creates a maximal plotting region
roc(obese, glm.fit$fitted.values, plot=TRUE)
## NOTE: By default, roc() uses specificity on the x-axis and the values range
## from 1 to 0. This makes the graph look like what we would expect, but the
## x-axis itself might induce a headache. To use 1-specificity (i.e. the 
## False Positive Rate) on the x-axis, set "legacy.axes" to TRUE.
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE)
## If you want to rename the x and y axes...
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage")
## We can also change the color of the ROC line, and make it wider...
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4)
## If we want to find out the optimal threshold we can store the 
## data used to make the ROC graph in a variable...
roc.info <- roc(obese, glm.fit$fitted.values, legacy.axes=TRUE)
str(roc.info)
## and then extract just the information that we want from that variable.
roc.df <- data.frame(
  tpp=roc.info$sensitivities*100, ## tpp = true positive percentage
  fpp=(1 - roc.info$specificities)*100, ## fpp = false positive precentage
  thresholds=roc.info$thresholds)
head(roc.df) ## head() will show us the values for the upper right-hand corner
## of the ROC graph, when the threshold is so low 
## (negative infinity) that every single sample is called "obese".
## Thus TPP = 100% and FPP = 100%
tail(roc.df) ## tail() will show us the values for the lower left-hand corner
## of the ROC graph, when the threshold is so high (infinity) 
## that every single sample is called "not obese". 
## Thus, TPP = 0% and FPP = 0%
## now let's look at the thresholds between TPP 60% and 80%...
roc.df[roc.df$tpp > 60 & roc.df$tpp < 80,]
## We can calculate the area under the curve...
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE)
## ...and the partial area under the curve.
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE, print.auc.x=45, partial.auc=c(100, 90), auc.polygon = TRUE, auc.polygon.col = "#377eb822")
#######################################
##
## Now let's fit the data with a random forest...
##
#######################################
rf.model <- randomForest(factor(obese) ~ weight)
## ROC for random forest
roc(obese, rf.model$votes[,1], plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#4daf4a", lwd=4, print.auc=TRUE)
#######################################
##
## Now layer logistic regression and random forest ROC graphs..
##
#######################################
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE)
plot.roc(obese, rf.model$votes[,1], percent=TRUE, col="#4daf4a", lwd=4, print.auc=TRUE, add=TRUE, print.auc.y=40)
legend("bottomright", legend=c("Logisitic Regression", "Random Forest"), col=c("#377eb8", "#4daf4a"), lwd=4)
#######################################
##
## Now that we're done with our ROC fun, let's reset the par() variables.
## There are two ways to do it...
##
#######################################
par(pty = "m")

参考:
https://www.bilibili.com/video/BV1SK4y1K7v3
https://www.youtube.com/watch?v=qcvAqAH60Yw

相关文章

  • 房屋信贷违约风险竞争(kaggle)系列2-数据清理和格式化

    一. ROC和AUC ROC(受试者工作特征曲线)AUC(ROC曲线下的面积) 在诸如逻辑回归或者神经网络的分类器...

  • sklearn notes

    Difference between roc_auc_score() and auc() AUC is not a...

  • 《机器学习》第二章

    知识整理 ROC和AUC ROC的画法:参考CSDN。 AUC的意义:从Positive中选一个例子P,从Nege...

  • AUC面试

    目录 混淆矩阵 ROC曲线 AUC定义 AUC计算 AUC 优点和缺点 AUC的python实现代码 1. 混淆矩...

  • ROC和AUC

    前言 ROC(Receiver Operating Characteristic)曲线和AUC常被用来评价一个二值...

  • AUC和ROC

    AUC(只能用于二分类) AUC值是一个概率值,当你随机挑选一个正样本以及一个负样本,当前的分类算法根据计算得到的...

  • auc和roc

    ROC曲线下面的面积就是AUC的值,介于0.1和1之间。Auc作为数值可以直观的评价分类器的好坏,值越大越好。 a...

  • AUC和ROC

    AUC :曲线下面积(Area Under the Curve) AUROC :接受者操作特征曲线下面积(Area...

  • 【实战篇】集成算法建模(二)

    连载的上一篇文章,小鱼和大家一起学习了 ROC 曲线和 AUC 面积:AUC 被定义为 ROC 曲线下方的面积,A...

  • 评价指标

    AUC(Area under curve): ROC曲线下的面积。 AUC详解 :参考链接

网友评论

    本文标题:AUC和ROC

    本文链接:https://www.haomeiwen.com/subject/hzfddltx.html