美文网首页
R语言机器学习与临床预测模型68--绘制带置信区间的预测模型曲线

R语言机器学习与临床预测模型68--绘制带置信区间的预测模型曲线

作者: 科研私家菜 | 来源:发表于2022-07-13 16:28 被阅读0次

    R小盐准备介绍R语言机器学习与预测模型的学习笔记, 快来收藏关注【科研私家菜】


    01 预测模型结果可视化

    在rms工具包中,使用Predict()函数预测的模型结果可以直接使用plot()或ggplot()函数进行绘制,并自带置信区间,但是建模函数必须使用rms工具包中的相关函数。
    对线性模型而言,基础包stats中的函数是lm(),而rms工具包中是ols()函数。

    
    ols(formula, data=environment(formula),
        weights, subset, na.action=na.delete, 
        method="qr", model=FALSE,
        x=FALSE, y=FALSE, se.fit=FALSE, linear.predictors=TRUE,
        penalty=0, penalty.matrix, tol=1e-7, sigma,
        var.penalty=c('simple','sandwich'), ...)
    
    library(rms)
    model.1 <- lm(mpg ~ wt, data = mtcars)
    model.2 <- ols(mpg ~ wt, data = mtcars)
     
    summary(model.1)
    
    
    print(model.2)
    

    02 stats包的使用

    在stats包中,与模型预测相关的有两个函数:fitted()和predict()。前者是根据模型结果对原数据中的样本的响应变量(因变量)进行预测:

    head(fitted(model.1))
    predict()函数则可以对新数据的样本进行预测:
    
    ## S3 method for class 'lm'
    predict(object, newdata, se.fit = FALSE, scale = NULL, df = Inf,
            interval = c("none", "confidence", "prediction"),
            level = 0.95, type = c("response", "terms"),
            terms = NULL, na.action = na.pass,
            pred.var = res.var/weights, weights = 1, ...)
    

    object:模型对象;

    newdata:新数据;必须为数据框结构;若省略,则默认为原数据;

    interval:区间类型,有两种类型:置信区间(confidence)和预测区间(prediction);

    level:置信水平;默认为95%。

    predict(model.1, data.frame(wt = seq(1,6,0.5)),
            interval = "prediction")
    predict(model.1, data.frame(wt = seq(1,6,0.5)),
            interval = "confidence")
    

    而在rms工具包中,模型预测对应的是Predict()函数(首字母大写):

    Predict(object, ..., fun=NULL, funint=TRUE,
            type = c("predictions", "model.frame", "x"),
            np = 200, conf.int = 0.95,
            conf.type = c("mean", "individual","simultaneous"),
            usebootcoef=TRUE, boot.type=c("percentile", "bca", "basic"),
            posterior.summary=c('mean', 'median', 'mode'),
            adj.zero = FALSE, ref.zero = FALSE,
            kint=NULL, ycut=NULL, time = NULL, loglog = FALSE,
            digits=4, name,
            factors=NULL, offset=NULL)
    # conf.type:区间类型;置信区间(mean)、预测区间(individual)
    
    Predict()函数的“新数据”使用...表示,直接写出参与预测的变量即可:
    
    predict.2 <- Predict(model.2, wt = seq(1,6,0.5)) 
    predict.2
    
    

    也可以不指定参与预测变量的数值,但需要提前定义变量的范围和等分的个数。指定范围的方法如下:

    ddist <- datadist(mtcars)
    options(datadist = "ddist")
    # 等分数目由np参数确定,默认值为200:
    
    Predict(model.2, wt, np = 10)
    

    03 预测模型置信区间绘图

    Predict()函数预测的结果可以直接使用plot()函数和ggplot()函数进行绘制。在加载rms工具包时会自动加载ggplot2工具包。
    需要注意的是,rms包中的plot()函数采用的是lattice绘图系统,而非基础绘图系统(graphics),因此不能叠加graphics包中的函数。

    plot(predict.2)
    ggplot(predict.2)
    ggplot(predict.2) +
      geom_point(data = mtcars, aes(wt, mpg))
    

    效果如下:


    model.3 <- ols(mpg ~ pol(wt,2), data = mtcars)
    predict.3 <- Predict(model.3, wt = seq(1,6,0.5)) 
    predict.3
     
    ggplot(predict.3) +
      geom_point(data = mtcars, aes(wt, mpg))
    
    ddist <- datadist(mtcars)
    options(datadist = "ddist")
    model.4 <- ols(mpg ~ wt + drat, data = mtcars)
    predict.41 <- Predict(model.4, np = 10)
    plot(predict.41)
    
    ## 使用单个变量进行预测:
    predict.42 <- Predict(model.4, wt = seq(1,6,0.5))
    plot(predict.42)
    # 未参与预测的变量实际上取的是中位数:
    median(mtcars$drat)
    ## [1] 3.695
    # 也可以指定未参与预测变量的取值:
    
    predict.43 <- Predict(model.4, wt = seq(1,6,0.5),
                          drat = 4,
                          np = 10)
    plot(predict.43)
    

    rms包中的模型函数基本上是自成体系的,如logistic回归用lrm()函数,以及一系列的样条曲线函数,如rcs()函数(linear tail-restricted cubic spline function)。

    ddist <- datadist(mtcars)
    options(datadist = "ddist")
    model.5 <- ols(mpg ~ rcs(wt) + drat, data = mtcars)
    predict.5 <- Predict(model.5, wt)
    plot(predict.5)
    

    04 ROC曲线95%置信区间

    library(pROC)
    data(aSAH)
    rocobj <- plot.roc(aSAH$outcome, aSAH$s100b,
                       
                       main="Confidence intervals", percent=TRUE,
                       
                       ci=TRUE, # compute AUC (of AUC by default)
                       
                       print.auc=TRUE) # print the AUC (will contain the CI)
     
    ciobj <- ci.se(rocobj, # CI of sensitivity
                   
                   specificities=seq(0, 100, 5)) # over a select set of specificities
     
    plot(ciobj, type="shape", col="#1c61b6AA") # plot as a blue shape
    plot(ci(rocobj, of="thresholds", thresholds="best")) # add one threshold
    
    library(pROC)
    library(ROCR)
    rocobj <- roc(aSAH$outcome, aSAH$s100b,auc = TRUE,
                  
                  ci=TRUE, # compute AUC (of AUC by default)
                  
                  print.auc=TRUE) # print the AUC (will contain the CI)
    
    ciobj <- ci.se(rocobj, # CI of sensitivity
                   
                   specificities=seq(0, 1, 0.01)) # over a select set of specificities
    
    auc<-auc(rocobj)[1]
    auc_low<-ci(rocobj,of="auc")[1]
    auc_high<-ci(rocobj,of="auc")[3]
    auc_full<-paste("AUC:",round(auc,digits = 3),"(",
                    round(auc_low,digits = 3),",",round(auc_high,digits = 3),")",sep = "")
    
    data_ci<-ciobj[1:101,1:3]
    data_ci<-as.data.frame(data_ci)
    
    library(ggplot2)
    ggroc(rocobj,color="red",size=1)+theme_bw()+
      geom_segment(aes(x = 1, y = 0, xend = 0, yend = 1), 
                   colour='grey', linetype = 'dotdash') +
      
      geom_ribbon(data = data_ci,aes(x=as.numeric(rownames(data_ci)),ymin=`2.5%`,ymax=`97.5%`), fill = 'lightblue',alpha=0.5)+
      # geom_ribbon(data = data_ci,aes(x=x,ymin=`2.5%`,ymax=`97.5%`), fill = 'lightblue',alpha=0.5)+
      theme(plot.title = element_text(hjust = 0.5), 
            legend.justification=c(1, 0), legend.position=c(.95, .05),
            #legend.title=title, 
            legend.background = element_rect(fill=NULL, size=0.5, 
                                             linetype="solid", colour ="black"))+
      labs(x="Specificity",y="Sensitivity")
    

    参考资料

    (1条消息) rms | 如何绘制模型带置信区间的预测曲线_R语言学堂的博客-CSDN博客
    https://link.springer.com/content/pdf/bfm%3A978-3-319-19425-7%2F1.pdf
    绘制ROC曲线95%置信区间 - 灰信网(软件开发博客聚合) (freesion.com)


    关注R小盐,关注科研私家菜(VX_GZH: SciPrivate),有问题请联系R小盐。让我们一起来学习 R语言机器学习与临床预测模型

    相关文章

      网友评论

          本文标题:R语言机器学习与临床预测模型68--绘制带置信区间的预测模型曲线

          本文链接:https://www.haomeiwen.com/subject/qowxmrtx.html