R包学习-pROC

作者: Thinkando | 来源:发表于2020-03-26 22:53 被阅读0次

R包学习-pROC
pROC包学习
ROC曲线
R语言pROC包绘制ROC曲线
R语言ROC曲线包
如何快速学会用R语言做出漂亮的ROC图
R package pROC
学习小组Day5笔记--jam
学习小组Day6笔记——学生小幸
学习小组DAY6--帅炸

ROC曲线

我们在做诊断性实验的时候，最常用的选择最佳cutoff的方法是使用ROC曲线。本次主要介绍的是pROC包对于 ROC曲线的的绘制和分析

https://www.jianshu.com/p/04e3bb3bc990

> library(pROC)
> data("aSAH")
> head(aSAH)
   gos6 outcome gender age wfns s100b  ndka
29    5    Good Female  42    1  0.13  3.01
30    5    Good Female  37    1  0.14  8.54
31    5    Good Female  42    1  0.10  8.09
32    5    Good Female  27    1  0.04 10.42
33    1    Poor Female  42    3  0.13 17.40
34    1    Poor   Male  48    2  0.10 12.75

roc roc是这个包的主要函数。通过roc函数我们可以构建一个ROC曲线，并返回一个ROC的列表。这个列表可以被“打印”，可以被“plot”，同时可以使用auc, ci, smooth.roc and coords等辅助函数调出结果。

> ##以下两种方式是等价的
> roc(aSAH$outcome ~ aSAH$s100b)
> roc(outcome ~ s100b, aSAH)
>roc(aSAH$outcome, aSAH$s100b,direction = ">") # Area under the curve: 0.2686
Setting levels: control = Good, case = Poor
Setting direction: controls < cases

Call:
roc.formula(formula = outcome ~ s100b, data = aSAH)

Data: s100b in 72 controls (outcome Good) < 41 cases (outcome Poor).
Area under the curve: 0.7314

在亚组中进行roc分析

roc(outcome ~ s100b, data=aSAH, subset=(gender == "Male"))
roc(outcome ~ s100b, data=aSAH, subset=(gender == "Female"))

改变case的因子水平

> roc(aSAH$outcome, aSAH$s100b,
+     levels=c("Poor", "Good"))
Setting direction: controls > cases

Call:
roc.default(response = aSAH$outcome, predictor = aSAH$s100b,     levels = c("Poor", "Good"))

Data: aSAH$s100b in 41 controls (aSAH$outcome Poor) > 72 cases (aSAH$outcome Good).
Area under the curve: 0.7314

##AUC变成百分比
roc(aSAH$outcome, aSAH$s100b, percent = T)

> ##计算95%CI
> roc(aSAH$outcome, aSAH$s100b, ci = T)
Setting levels: control = Good, case = Poor
Setting direction: controls < cases

Call:
roc.default(response = aSAH$outcome, predictor = aSAH$s100b,     ci = T)

Data: aSAH$s100b in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor).
Area under the curve: 0.7314
95% CI: 0.6301-0.8326 (DeLong)

roc.test roc.test用来比较两个或者多个roc曲线是否有差别。

###通过roc函数来比较两个roc的区别
roc1 <- roc(aSAH$outcome, aSAH$s100b)
roc2 <- roc(aSAH$outcome, aSAH$wfns)
roc.test(roc1, roc2)

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc1 and roc2
## Z = -2.209, p-value = 0.02718
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7313686   0.8236789

###如果是一个数据框可以在一个公式中比较
roc.test(aSAH$outcome, aSAH$s100b, aSAH$wfns)

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  aSAH$s100b and aSAH$wfns by aSAH$outcome (Good, Poor)
## Z = -2.209, p-value = 0.02718
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7313686   0.8236789

###改变比较的方法默认的的是delong法。可选bootstrap
roc.test(roc1, roc2, method = "bootstrap")

###比较两个ROC的灵敏度和特异度
roc.test(roc1, roc2, method="specificity", specificity=0.9)

## 
##  Specificity test for two correlated ROC curves
## 
## data:  roc1 and roc2
## D = -1.3119, boot.n = 2000, boot.stratified = 1, p-value = 0.1895
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7313686   0.8236789

roc.test(roc1, roc2, method="sensitivity", sensitivity=0.9)

## 
##  Sensitivity test for two correlated ROC curves
## 
## data:  roc1 and roc2
## D = -3.1677, boot.n = 2000, boot.stratified = 1, p-value =
## 0.001537
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7313686   0.8236789

###如果两个roc的样本量不同则需要进行配置
roc7 <- roc(aSAH$outcome, aSAH$s100b)
roc8 <- roc(aSAH$outcome[1:100], aSAH$s100b[1:100])
roc.test(roc7, roc8, paired=FALSE, method="delong")

## 
##  DeLong's test for two ROC curves
## 
## data:  roc7 and roc8
## D = 0.16859, df = 205.38, p-value = 0.8663
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7313686   0.7183601

auc 通过AUC来计算ROC曲线下面积

rocobj <- roc(aSAH$outcome, aSAH$s100b)
auc(rocobj)
###同样的可以使用roc函数计算
roc(aSAH$outcome, aSAH$s100b)$auc

## Area under the curve: 0.7314

ci.auc 计算一个roc曲线的曲线下面积以及95%CI

rocobj <- roc(aSAH$outcome, aSAH$s100b)
result <- ci.auc(rocobj)
###结果其实包括三个变量分别是auc, lower, upper
result[1]; result[2]; result[3]

## [1] 0.6301182
## [1] 0.7313686
## [1] 0.8326189

coords 返回ROC曲线计算过程中的变量值

其中包括阈值、灵敏度、特异度、准确率、真阴性数、真阳性数、假阴性数、假阳性数、阳性预测值、阴性预测值…… coords(roc, x, input, ret,as.list, drop, best.method) 参数当中x可以是数字代表input的值，all代表ROC所有点的值。 best代表灵敏度和特异度和最大的点。local maximasROC曲线局部最大的点 input如果X是数字的需要制定input。参数时“threshold”, “specificity” or “sensitivity”这三个的一个。同时可以简写为“t”,“sp”和“se” ret返回值。可以是“threshold”, “specificity”, “sensitivity”, “accuracy”, “tn” (true negative count), “tp” (true positive count), “fn” (false negative count), “fp” (false positive count), “npv” (negative predictive value), “ppv” (positive predictive value), “precision”, “recall”. “1-specificity”, “1-sensitivity”, “1-accuracy”, “1-npv” and “1-ppv” drop以最简单的形式返回结果 as.list以列表的形式输出结果 best.method逻辑值。通过这个参数可以决定最佳的阈值

##制定X值
coords(roc=rocobj, x=0.5, input="sensitivity", ret="sensitivity")
## [1] 0.5
##得到所有灵敏度的值
sensitivities <- coords(rocobj, "all", ret="se")
##得到最佳的阈值的cutoff
coords(rocobj, "b", ret="t")
## [1] 0.205
##得到作用结果
coords(rocobj, "best", ret=c("threshold", "specificity", "sensitivity", "accuracy",
"tn", "tp", "fn", "fp", "npv", "ppv", "1-specificity",
"1-sensitivity", "1-accuracy", "1-npv", "1-ppv",
"precision", "recall"), drop = T)
##     threshold   specificity   sensitivity      accuracy            tn 
##     0.2050000     0.8055556     0.6341463     0.7433628    58.0000000 
##            tp            fn            fp           npv           ppv 
##    26.0000000    15.0000000    14.0000000     0.7945205     0.6500000 
## 1-specificity 1-sensitivity    1-accuracy         1-npv         1-ppv 
##     0.1944444     0.3658537     0.2566372     0.2054795     0.3500000 
##     precision        recall 
##     0.6500000     0.6341463

plot.roc 用来绘制ROC曲线的最主要的函数。

add如果是绘制两个或者多个图的话，使用这个可以绘制到同一个图上。 smooth逻辑值，曲线是否平滑 type选择线条的类型:“l”只是线条、“p”只是点、“b”既有线条又有点、“n”啥都没有 legacy.axes X轴的参数。如果为F则为sp，如果为T则为1-sp。一般而言是1-sp xlim,ylim,xlab,ylab,asp,mar,mgp基本画图参数。 col, lty, lwd线条的颜色、类型和宽度 print.thres 标注ROC曲线当中最佳的位置。 print.thres.pch, print.thres.adj, print.thres.col, print.thres.cex 修改thres位置点的类型、字符位置的横向调整、颜色、字体的大小。 print.thres.pattern自定义阈值的内容 print.auc打印曲线下面积。 print.auc.pattern自定义AUC的内容 print.auc.x/print.auc.y定义auc的位置 print.auc.adj/cex/col定义auc的横向位置、字体大小以及颜色

##基本图形
plot.roc(roc1)

image.png

image

可以进行ROC分析和ROC 曲线展示的R包。

#1. 安装

> install.packages("pROC")

#2. 数据导入

> library(pROC)
> data(aSAH)
> head(aSAH)
   gos6 outcome gender age wfns s100b  ndka
29    5    Good Female  42    1  0.13  3.01
30    5    Good Female  37    1  0.14  8.54
31    5    Good Female  42    1  0.10  8.09
32    5    Good Female  27    1  0.04 10.42
33    1    Poor Female  42    3  0.13 17.40
34    1    Poor   Male  48    2  0.10 12.75

#3. ROC分析

##3.1 使用roc()进行ROC分析

> roc(aSAH$outcome, aSAH$s100b,plot = T)
> roc(outcome ~ s100b, aSAH,plot=T，levels=c("Good", "Poor"))
> roc(controls=aSAH$s100b[aSAH$outcome=="Good"], cases=aSAH$s100b[aSAH$outcome=="Poor"])
Call:
roc.formula(formula = outcome ~ s100b, data = aSAH, plot = T)

Data: s100b in 72 controls (outcome Good) < 41 cases (outcome Poor).
Area under the curve: 0.7314

image

##3.2 roc(）参数详细解释

roc(...)
# S3 method for formula
roc(formula, data, ...)
# S3 method for default
roc(response, predictor, controls, cases,
density.controls, density.cases,
levels=base::levels(as.factor(response)), percent=FALSE, na.rm=TRUE,
direction=c("auto", "<", "="">"), algorithm = 5, quiet = TRUE, 
smooth=FALSE, auc=TRUE, ci=FALSE, plot=FALSE, smooth.method="binormal",
ci.method=NULL, density=NULL, ...)

roc1 <- roc(aSAH$outcome,
            aSAH$s100b, percent=TRUE,
            # 设置auc参数
            partial.auc=c(100, 90), partial.auc.correct=TRUE,
            partial.auc.focus="sens",
            # 设置ci参数
            ci=TRUE, boot.n=100, ci.alpha=0.9, stratified=FALSE,
            # 设置画图参数
            plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE,
            show.thres=TRUE
            )

#在原有的图上加ROC曲线
roc2 <- roc(aSAH$outcome, aSAH$wfns,
            plot=TRUE, add=TRUE, percent=roc1$percent)

response：样本类型，一般只有两类（ 0 (controls) 和 1 (cases)），可做ROC曲线。当类型过多时，需要使用levels参数指定那些值作为control ，那些作为 case。
predictor：样本的预测值，可以是概率、排名之类。
controls, cases: 直接提供controls和cases，可以是数值向量，也可以是排好序的向量。
formula, data：通过表达式传入数据框中的值。
levels： controls 和 cases 对应的值，默认为levels(as.factor(response))，剩下的忽略；当输入数据为0、1，默认0为controls。
percent：sensitivities, specificities 和 AUC返回形式为百分数。
direction：根据两组数据中位数大小确定；“>”: control组中位数值大于cases组；“<”：control组中位数值小于或等于cases组。
algorithm：1，也是默认，数量较少；2，数量大于1000时，速度更快；3，C++实现算法，快3-5x; 4 (debug only, slow): 三个算法都运行一遍，确认返回值知否一样。5，迅速选择2或3。
smooth：ROC 曲线修饰为平滑曲线。
auc：计算AUC，默认TRUE。
ci：是否计算置信区间。
plot：是否作图。
smooth.method, ci.method：smooth 算法。

##3.3 返回ROC计算对象

coords(roc1, "best", ret=c("threshold", "specificity", "1-npv"))
coords(roc2, "local maximas", ret=c("threshold", "sens", "spec", "ppv", "npv"))

##3.4 置信区间

# Of the AUC
ci(roc2)

# Of the curve
sens.ci <- ci.se(roc1, specificities=seq(0, 100, 5))
plot(sens.ci, type="shape", col="lightblue")
plot(sens.ci, type="bars")

# need to re-add roc2 over the shape
plot(roc2, add=TRUE)

# CI of thresholds
plot(ci.thresholds(roc2))

##3.5 roc.test()对ROC进行统计检验

# Test on the whole AUC
> roc.test(roc1, roc2, reuse.auc=FALSE)
DeLong's test for two correlated ROC curves

data:  roc1 and roc2
Z = -2.209, p-value = 0.02718
alternative hypothesis: true difference in AUC is not equal to 0
sample estimates:
AUC of roc1 AUC of roc2 
   73.13686    82.36789 

# Test on a portion of the whole AUC
> roc.test(roc1, roc2, reuse.auc=FALSE, partial.auc=c(100, 90),
         partial.auc.focus="se", partial.auc.correct=TRUE)

    # With modified bootstrap parameters
> roc.test(roc1, roc2, reuse.auc=FALSE, partial.auc=c(100, 90),
         partial.auc.correct=TRUE, boot.n=1000, boot.stratified=FALSE)

##3.6 roc.test(）参数

alternative：“two.sided”, “less” ，“greater”。对于method="venkatraman"，只能使用 “two.sided”
paired：是否时配对，会自动检测。
boot.n：指定method="bootstrap"中自检举重复次数和 method="venkatraman"中置换次数；默认，2000。
boot.stratified：每次自检举过程中，cases/controls 比例与原始样本中比例一致。
method：

“delong”，默认，不适用于partial AUC, smoothed curves，curves with different direction。
“bootstrap” ：
“venkatraman”：不适用于smoothed ROC curves, or curves with partial AUC specifications

##3.7 roc.test(）中统计学方法

method="bootstrap"

自检举抽样，通过boot.n指定自检举次数。
计算AUC。
计算标准偏差。
[图片上传失败...(image-eabcc6-1596545977177)]
D与正态分布进行比较

method="delong"
method="delong" 在文章DeLong *et al.* (1988) 中提到用于配对ROC 曲线的检验，实际利用是在文章 Sun and Xu (2014)；现在已经延伸用于非配对的 ROC 曲线研究中，
[图片上传失败...(image-8901d1-1596545977177)]
method="venkatrama"
method="venkatraman"在文章Venkatraman and Begg (1996) (for paired ROC curves) 和Venkatraman (2000) (for unpaired ROC curves)对样本排名进行置换检验。通过boot.n指定置换次数。

##3.7 样本量

# Two ROC curves
power.roc.test(roc1, roc2, reuse.auc=FALSE)
power.roc.test(roc1, roc2, power=0.9, reuse.auc=FALSE)

# One ROC curve
power.roc.test(auc=0.8, ncases=41, ncontrols=72)
power.roc.test(auc=0.8, power=0.9)
power.roc.test(auc=0.8, ncases=41, ncontrols=72, sig.level=0.01)
power.roc.test(ncases=41, ncontrols=72, power=0.9)

#4 pROC使用更多实例

EXPASY基于R代码上给出了pROC的6个示例，见pROC: Screenshots，下面看一个例子：

library(pROC)
data(aSAH)
rocobj <- plot.roc(aSAH$outcome, aSAH$s100b, percent = TRUE, main="Smoothing")
lines(smooth(rocobj), # smoothing (default: binormal)
col = "#1c61b6")
lines(smooth(rocobj, method = "density"), # density smoothing
col = "#008600")
lines(smooth(rocobj, method = "fitdistr", # fit a distribution
density = "lognormal"), # let the distribution be log-normal
col = "#840000")
legend("bottomright", legend = c("Empirical", "Binormal", "Density", "Fitdistr\n(Log-normal)"), col = c("black", "#1c61b6", "#008600", "#840000"),lwd = 2)

image

R包学习-pROC
ROC曲线我们在做诊断性实验的时候，最常用的选择最佳cutoff的方法是使用ROC曲线。本次主要介绍的是pROC...
pROC包学习
ROC曲线 > 我们在做诊断性实验的时候，最常用的选择最佳cutoff的方法是使用ROC曲线。本次主要介绍的是pR...
ROC曲线
install.packages('pROC') #下载pROC包 library(pROC) #调用pROC包 ...
R语言pROC包绘制ROC曲线
作者：白介素2相关阅读：R语言-multiROC packageR语言生存分析04-Cox比例风险模型诊断R语言生...
R语言ROC曲线包
R语言有多种包可以画ROC曲线，如比较出名的有ROCR包，pROC包等。如有2个以上变量，则可单独画在一张图上，也...
如何快速学会用R语言做出漂亮的ROC图
pROC是一个专门用来计算和绘制ROC曲线的R包，目前已被CRAN收录，因此安装也非常简单，同时该包也兼容ggpl...
R package pROC
if(!require(pROC)) install.packages("pROC") test data dat...
学习小组Day5笔记--jam
R包学习镜像设置 R包安装 install.package()BioManager::install() 学习d...
学习小组Day6笔记——学生小幸
今日份学习主要是R包，学习安装加载R包，以及示例deplyr包的使用和功能。
学习小组DAY6--帅炸
学习R包安装和加载R包镜像设置安装 R包安装命令是install.packages(“包”)或者BiocMa...