美文网首页
R包学习-pROC

R包学习-pROC

作者: Thinkando | 来源:发表于2020-03-26 22:53 被阅读0次
    • ROC曲线

    我们在做诊断性实验的时候,最常用的选择最佳cutoff的方法是使用ROC曲线。本次主要介绍的是pROC包对于 ROC曲线的的绘制和分析

    https://www.jianshu.com/p/04e3bb3bc990

    > library(pROC)
    > data("aSAH")
    > head(aSAH)
       gos6 outcome gender age wfns s100b  ndka
    29    5    Good Female  42    1  0.13  3.01
    30    5    Good Female  37    1  0.14  8.54
    31    5    Good Female  42    1  0.10  8.09
    32    5    Good Female  27    1  0.04 10.42
    33    1    Poor Female  42    3  0.13 17.40
    34    1    Poor   Male  48    2  0.10 12.75
    
    • roc roc是这个包的主要函数。通过roc函数我们可以构建一个ROC曲线,并返回一个ROC的列表。这个列表可以被“打印”, 可以被“plot”,同时可以使用auc, ci, smooth.roc and coords等辅助函数调出结果。
    > ##以下两种方式是等价的
    > roc(aSAH$outcome ~ aSAH$s100b)
    > roc(outcome ~ s100b, aSAH)
    >roc(aSAH$outcome, aSAH$s100b,direction = ">") # Area under the curve: 0.2686
    Setting levels: control = Good, case = Poor
    Setting direction: controls < cases
    
    Call:
    roc.formula(formula = outcome ~ s100b, data = aSAH)
    
    Data: s100b in 72 controls (outcome Good) < 41 cases (outcome Poor).
    Area under the curve: 0.7314
    

    在亚组中进行roc分析

    roc(outcome ~ s100b, data=aSAH, subset=(gender == "Male"))
    roc(outcome ~ s100b, data=aSAH, subset=(gender == "Female"))
    

    改变case的因子水平

    > roc(aSAH$outcome, aSAH$s100b,
    +     levels=c("Poor", "Good"))
    Setting direction: controls > cases
    
    Call:
    roc.default(response = aSAH$outcome, predictor = aSAH$s100b,     levels = c("Poor", "Good"))
    
    Data: aSAH$s100b in 41 controls (aSAH$outcome Poor) > 72 cases (aSAH$outcome Good).
    Area under the curve: 0.7314
    
    
    ##AUC变成百分比
    roc(aSAH$outcome, aSAH$s100b, percent = T)
    
    > ##计算95%CI
    > roc(aSAH$outcome, aSAH$s100b, ci = T)
    Setting levels: control = Good, case = Poor
    Setting direction: controls < cases
    
    Call:
    roc.default(response = aSAH$outcome, predictor = aSAH$s100b,     ci = T)
    
    Data: aSAH$s100b in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor).
    Area under the curve: 0.7314
    95% CI: 0.6301-0.8326 (DeLong)
    

    roc.test roc.test用来比较两个或者多个roc曲线是否有差别。

    ###通过roc函数来比较两个roc的区别
    roc1 <- roc(aSAH$outcome, aSAH$s100b)
    roc2 <- roc(aSAH$outcome, aSAH$wfns)
    roc.test(roc1, roc2)
    
    ## 
    ##  DeLong's test for two correlated ROC curves
    ## 
    ## data:  roc1 and roc2
    ## Z = -2.209, p-value = 0.02718
    ## alternative hypothesis: true difference in AUC is not equal to 0
    ## sample estimates:
    ## AUC of roc1 AUC of roc2 
    ##   0.7313686   0.8236789
    
    ###如果是一个数据框可以在一个公式中比较
    roc.test(aSAH$outcome, aSAH$s100b, aSAH$wfns)
    
    ## 
    ##  DeLong's test for two correlated ROC curves
    ## 
    ## data:  aSAH$s100b and aSAH$wfns by aSAH$outcome (Good, Poor)
    ## Z = -2.209, p-value = 0.02718
    ## alternative hypothesis: true difference in AUC is not equal to 0
    ## sample estimates:
    ## AUC of roc1 AUC of roc2 
    ##   0.7313686   0.8236789
    
    ###改变比较的方法默认的的是delong法。可选bootstrap
    roc.test(roc1, roc2, method = "bootstrap")
    
    ###比较两个ROC的灵敏度和特异度
    roc.test(roc1, roc2, method="specificity", specificity=0.9)
    
    ## 
    ##  Specificity test for two correlated ROC curves
    ## 
    ## data:  roc1 and roc2
    ## D = -1.3119, boot.n = 2000, boot.stratified = 1, p-value = 0.1895
    ## alternative hypothesis: true difference in AUC is not equal to 0
    ## sample estimates:
    ## AUC of roc1 AUC of roc2 
    ##   0.7313686   0.8236789
    
    roc.test(roc1, roc2, method="sensitivity", sensitivity=0.9)
    
    ## 
    ##  Sensitivity test for two correlated ROC curves
    ## 
    ## data:  roc1 and roc2
    ## D = -3.1677, boot.n = 2000, boot.stratified = 1, p-value =
    ## 0.001537
    ## alternative hypothesis: true difference in AUC is not equal to 0
    ## sample estimates:
    ## AUC of roc1 AUC of roc2 
    ##   0.7313686   0.8236789
    
    ###如果两个roc的样本量不同则需要进行配置
    roc7 <- roc(aSAH$outcome, aSAH$s100b)
    roc8 <- roc(aSAH$outcome[1:100], aSAH$s100b[1:100])
    roc.test(roc7, roc8, paired=FALSE, method="delong")
    
    ## 
    ##  DeLong's test for two ROC curves
    ## 
    ## data:  roc7 and roc8
    ## D = 0.16859, df = 205.38, p-value = 0.8663
    ## alternative hypothesis: true difference in AUC is not equal to 0
    ## sample estimates:
    ## AUC of roc1 AUC of roc2 
    ##   0.7313686   0.7183601
    

    auc 通过AUC来计算ROC曲线下面积

    rocobj <- roc(aSAH$outcome, aSAH$s100b)
    auc(rocobj)
    ###同样的可以使用roc函数计算
    roc(aSAH$outcome, aSAH$s100b)$auc
    
    ## Area under the curve: 0.7314
    

    ci.auc 计算一个roc曲线的曲线下面积以及95%CI

    rocobj <- roc(aSAH$outcome, aSAH$s100b)
    result <- ci.auc(rocobj)
    ###结果其实包括三个变量分别是auc, lower, upper
    result[1]; result[2]; result[3]
    
    ## [1] 0.6301182
    ## [1] 0.7313686
    ## [1] 0.8326189
    

    coords 返回ROC曲线计算过程中的变量值

    • 其中包括阈值、灵敏度、特异度、准确率、真阴性数、真阳性数、假阴性数、假阳性数、阳性预测值、阴性预测值…… coords(roc, x, input, ret,as.list, drop, best.method) 参数当中x可以是数字代表input的值,all代表ROC所有点的值。 best代表灵敏度和特异度和最大的点。local maximasROC曲线局部最大的点 input如果X是数字的需要制定input。参数时“threshold”, “specificity” or “sensitivity”这三个的一个。同时可以简写为“t”,“sp”和“se” ret返回值。可以是“threshold”, “specificity”, “sensitivity”, “accuracy”, “tn” (true negative count), “tp” (true positive count), “fn” (false negative count), “fp” (false positive count), “npv” (negative predictive value), “ppv” (positive predictive value), “precision”, “recall”. “1-specificity”, “1-sensitivity”, “1-accuracy”, “1-npv” and “1-ppv” drop以最简单的形式返回结果 as.list以列表的形式输出结果 best.method逻辑值。通过这个参数可以决定最佳的阈值
    ##制定X值
    coords(roc=rocobj, x=0.5, input="sensitivity", ret="sensitivity")
    ## [1] 0.5
    ##得到所有灵敏度的值
    sensitivities <- coords(rocobj, "all", ret="se")
    ##得到最佳的阈值的cutoff
    coords(rocobj, "b", ret="t")
    ## [1] 0.205
    ##得到作用结果
    coords(rocobj, "best", ret=c("threshold", "specificity", "sensitivity", "accuracy",
    "tn", "tp", "fn", "fp", "npv", "ppv", "1-specificity",
    "1-sensitivity", "1-accuracy", "1-npv", "1-ppv",
    "precision", "recall"), drop = T)
    ##     threshold   specificity   sensitivity      accuracy            tn 
    ##     0.2050000     0.8055556     0.6341463     0.7433628    58.0000000 
    ##            tp            fn            fp           npv           ppv 
    ##    26.0000000    15.0000000    14.0000000     0.7945205     0.6500000 
    ## 1-specificity 1-sensitivity    1-accuracy         1-npv         1-ppv 
    ##     0.1944444     0.3658537     0.2566372     0.2054795     0.3500000 
    ##     precision        recall 
    ##     0.6500000     0.6341463
    
    

    plot.roc 用来绘制ROC曲线的最主要的函数。

    • add如果是绘制两个或者多个图的话,使用这个可以绘制到同一个图上。 smooth逻辑值,曲线是否平滑 type选择线条的类型:“l”只是线条、“p”只是点、“b”既有线条又有点、“n”啥都没有 legacy.axes X轴的参数。如果为F则为sp,如果为T则为1-sp。一般而言是1-sp xlim,ylim,xlab,ylab,asp,mar,mgp基本画图参数。 col, lty, lwd线条的颜色、类型和宽度 print.thres 标注ROC曲线当中最佳的位置。 print.thres.pch, print.thres.adj, print.thres.col, print.thres.cex 修改thres位置点的类型、字符位置的横向调整、颜色、字体的大小。 print.thres.pattern自定义阈值的内容 print.auc打印曲线下面积。 print.auc.pattern自定义AUC的内容 print.auc.x/print.auc.y定义auc的位置 print.auc.adj/cex/col定义auc的横向位置、字体大小以及颜色
    ##基本图形
    plot.roc(roc1)
    
    image.png

    <meta charset="utf-8">

    image

    可以进行ROC分析和ROC 曲线展示的R包。


    #1. 安装

    > install.packages("pROC")
    
    

    #2. 数据导入

    > library(pROC)
    > data(aSAH)
    > head(aSAH)
       gos6 outcome gender age wfns s100b  ndka
    29    5    Good Female  42    1  0.13  3.01
    30    5    Good Female  37    1  0.14  8.54
    31    5    Good Female  42    1  0.10  8.09
    32    5    Good Female  27    1  0.04 10.42
    33    1    Poor Female  42    3  0.13 17.40
    34    1    Poor   Male  48    2  0.10 12.75
    
    

    #3. ROC分析

    ##3.1 使用roc()进行ROC分析

    > roc(aSAH$outcome, aSAH$s100b,plot = T)
    > roc(outcome ~ s100b, aSAH,plot=T,levels=c("Good", "Poor"))
    > roc(controls=aSAH$s100b[aSAH$outcome=="Good"], cases=aSAH$s100b[aSAH$outcome=="Poor"])
    Call:
    roc.formula(formula = outcome ~ s100b, data = aSAH, plot = T)
    
    Data: s100b in 72 controls (outcome Good) < 41 cases (outcome Poor).
    Area under the curve: 0.7314
    
    
    image

    ##3.2 roc()参数详细解释

    roc(...)
    # S3 method for formula
    roc(formula, data, ...)
    # S3 method for default
    roc(response, predictor, controls, cases,
    density.controls, density.cases,
    levels=base::levels(as.factor(response)), percent=FALSE, na.rm=TRUE,
    direction=c("auto", "<", "="">"), algorithm = 5, quiet = TRUE, 
    smooth=FALSE, auc=TRUE, ci=FALSE, plot=FALSE, smooth.method="binormal",
    ci.method=NULL, density=NULL, ...)
    
    
    roc1 <- roc(aSAH$outcome,
                aSAH$s100b, percent=TRUE,
                # 设置auc参数
                partial.auc=c(100, 90), partial.auc.correct=TRUE,
                partial.auc.focus="sens",
                # 设置ci参数
                ci=TRUE, boot.n=100, ci.alpha=0.9, stratified=FALSE,
                # 设置画图参数
                plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE,
                show.thres=TRUE
                )
    
    #在原有的图上加ROC曲线
    roc2 <- roc(aSAH$outcome, aSAH$wfns,
                plot=TRUE, add=TRUE, percent=roc1$percent)   
    
    
    • response:样本类型,一般只有两类( 0 (controls) 和 1 (cases)),可做ROC曲线。当类型过多时,需要使用levels参数指定那些值作为control ,那些作为 case。
    • predictor:样本的预测值,可以是概率、排名之类。
    • controls, cases: 直接提供controls和cases,可以是数值向量,也可以是排好序的向量。
    • formula, data: 通过表达式传入数据框中的值 。
    • levels: controls 和 cases 对应的值,默认为levels(as.factor(response)),剩下的忽略;当输入数据为0、1,默认0为controls。
    • percent:sensitivities, specificities 和 AUC返回形式为百分数。
    • direction:根据两组数据中位数大小确定;“>”: control组中位数值大于cases组;“<”:control组中位数值小于或等于cases组。
    • algorithm:1,也是默认,数量较少;2,数量大于1000时,速度更快;3,C++实现算法,快3-5x; 4 (debug only, slow): 三个算法都运行一遍,确认返回值知否一样。5,迅速选择2或3。
    • smooth:ROC 曲线修饰为平滑曲线。
    • auc:计算AUC,默认TRUE。
    • ci:是否计算置信区间。
    • plot:是否作图。
    • smooth.method, ci.method:smooth 算法。

    ##3.3 返回ROC计算对象

    coords(roc1, "best", ret=c("threshold", "specificity", "1-npv"))
    coords(roc2, "local maximas", ret=c("threshold", "sens", "spec", "ppv", "npv"))
    
    

    ##3.4 置信区间

    # Of the AUC
    ci(roc2)
    
    # Of the curve
    sens.ci <- ci.se(roc1, specificities=seq(0, 100, 5))
    plot(sens.ci, type="shape", col="lightblue")
    plot(sens.ci, type="bars")
    
    # need to re-add roc2 over the shape
    plot(roc2, add=TRUE)
    
    # CI of thresholds
    plot(ci.thresholds(roc2))
    
    

    ##3.5 roc.test()对ROC进行统计检验

    # Test on the whole AUC
    > roc.test(roc1, roc2, reuse.auc=FALSE)
    DeLong's test for two correlated ROC curves
    
    data:  roc1 and roc2
    Z = -2.209, p-value = 0.02718
    alternative hypothesis: true difference in AUC is not equal to 0
    sample estimates:
    AUC of roc1 AUC of roc2 
       73.13686    82.36789 
    
    # Test on a portion of the whole AUC
    > roc.test(roc1, roc2, reuse.auc=FALSE, partial.auc=c(100, 90),
             partial.auc.focus="se", partial.auc.correct=TRUE)
    
        # With modified bootstrap parameters
    > roc.test(roc1, roc2, reuse.auc=FALSE, partial.auc=c(100, 90),
             partial.auc.correct=TRUE, boot.n=1000, boot.stratified=FALSE)
    
    

    ##3.6 roc.test()参数

    alternative:“two.sided”, “less” ,“greater”。对于method="venkatraman",只能使用 “two.sided”
    paired: 是否时配对,会自动检测。
    boot.n:指定method="bootstrap"中自检举重复次数和 method="venkatraman"中置换次数;默认,2000。
    boot.stratified:每次自检举过程中,cases/controls 比例与原始样本中比例一致。
    method

    • “delong”,默认,不适用于partial AUC, smoothed curves,curves with different direction。
    • “bootstrap” :
    • “venkatraman”:不适用于smoothed ROC curves, or curves with partial AUC specifications

    ##3.7 roc.test()中统计学方法

    • method="bootstrap"
    1. 自检举抽样,通过boot.n指定自检举次数。

    2. 计算AUC。

    3. 计算标准偏差。
      [图片上传失败...(image-eabcc6-1596545977177)]

    4. D与正态分布进行比较

    • method="delong"
      method="delong" 在文章DeLong *et al.* (1988) 中提到用于配对ROC 曲线的检验,实际利用是在文章 Sun and Xu (2014);现在已经延伸用于非配对的 ROC 曲线研究中,
      [图片上传失败...(image-8901d1-1596545977177)]

    • method="venkatrama"
      method="venkatraman"在文章Venkatraman and Begg (1996) (for paired ROC curves)Venkatraman (2000) (for unpaired ROC curves)对样本排名进行置换检验。通过boot.n指定置换次数。

    ##3.7 样本量

    # Two ROC curves
    power.roc.test(roc1, roc2, reuse.auc=FALSE)
    power.roc.test(roc1, roc2, power=0.9, reuse.auc=FALSE)
    
    # One ROC curve
    power.roc.test(auc=0.8, ncases=41, ncontrols=72)
    power.roc.test(auc=0.8, power=0.9)
    power.roc.test(auc=0.8, ncases=41, ncontrols=72, sig.level=0.01)
    power.roc.test(ncases=41, ncontrols=72, power=0.9)
    
    

    #4 pROC使用更多实例

    EXPASY基于R代码上给出了pROC的6个示例,见pROC: Screenshots,下面看一个例子:

    library(pROC)
    data(aSAH)
    rocobj <- plot.roc(aSAH$outcome, aSAH$s100b, percent = TRUE, main="Smoothing")
    lines(smooth(rocobj), # smoothing (default: binormal)
    col = "#1c61b6")
    lines(smooth(rocobj, method = "density"), # density smoothing
    col = "#008600")
    lines(smooth(rocobj, method = "fitdistr", # fit a distribution
    density = "lognormal"), # let the distribution be log-normal
    col = "#840000")
    legend("bottomright", legend = c("Empirical", "Binormal", "Density", "Fitdistr\n(Log-normal)"), col = c("black", "#1c61b6", "#008600", "#840000"),lwd = 2)
    
    
    image

    相关文章

      网友评论

          本文标题:R包学习-pROC

          本文链接:https://www.haomeiwen.com/subject/fqzhuhtx.html