R小盐准备介绍R语言机器学习与预测模型的学习笔记, 快来收藏关注【科研私家菜】
01 预测模型结果可视化
在rms工具包中,使用Predict()函数预测的模型结果可以直接使用plot()或ggplot()函数进行绘制,并自带置信区间,但是建模函数必须使用rms工具包中的相关函数。
对线性模型而言,基础包stats中的函数是lm(),而rms工具包中是ols()函数。
ols(formula, data=environment(formula),
weights, subset, na.action=na.delete,
method="qr", model=FALSE,
x=FALSE, y=FALSE, se.fit=FALSE, linear.predictors=TRUE,
penalty=0, penalty.matrix, tol=1e-7, sigma,
var.penalty=c('simple','sandwich'), ...)
library(rms)
model.1 <- lm(mpg ~ wt, data = mtcars)
model.2 <- ols(mpg ~ wt, data = mtcars)
summary(model.1)
print(model.2)
02 stats包的使用
在stats包中,与模型预测相关的有两个函数:fitted()和predict()。前者是根据模型结果对原数据中的样本的响应变量(因变量)进行预测:
head(fitted(model.1))
predict()函数则可以对新数据的样本进行预测:
## S3 method for class 'lm'
predict(object, newdata, se.fit = FALSE, scale = NULL, df = Inf,
interval = c("none", "confidence", "prediction"),
level = 0.95, type = c("response", "terms"),
terms = NULL, na.action = na.pass,
pred.var = res.var/weights, weights = 1, ...)
object:模型对象;
newdata:新数据;必须为数据框结构;若省略,则默认为原数据;
interval:区间类型,有两种类型:置信区间(confidence)和预测区间(prediction);
level:置信水平;默认为95%。
predict(model.1, data.frame(wt = seq(1,6,0.5)),
interval = "prediction")
predict(model.1, data.frame(wt = seq(1,6,0.5)),
interval = "confidence")
而在rms工具包中,模型预测对应的是Predict()函数(首字母大写):
Predict(object, ..., fun=NULL, funint=TRUE,
type = c("predictions", "model.frame", "x"),
np = 200, conf.int = 0.95,
conf.type = c("mean", "individual","simultaneous"),
usebootcoef=TRUE, boot.type=c("percentile", "bca", "basic"),
posterior.summary=c('mean', 'median', 'mode'),
adj.zero = FALSE, ref.zero = FALSE,
kint=NULL, ycut=NULL, time = NULL, loglog = FALSE,
digits=4, name,
factors=NULL, offset=NULL)
# conf.type:区间类型;置信区间(mean)、预测区间(individual)
Predict()函数的“新数据”使用...表示,直接写出参与预测的变量即可:
predict.2 <- Predict(model.2, wt = seq(1,6,0.5))
predict.2
也可以不指定参与预测变量的数值,但需要提前定义变量的范围和等分的个数。指定范围的方法如下:
ddist <- datadist(mtcars)
options(datadist = "ddist")
# 等分数目由np参数确定,默认值为200:
Predict(model.2, wt, np = 10)
03 预测模型置信区间绘图
Predict()函数预测的结果可以直接使用plot()函数和ggplot()函数进行绘制。在加载rms工具包时会自动加载ggplot2工具包。
需要注意的是,rms包中的plot()函数采用的是lattice绘图系统,而非基础绘图系统(graphics),因此不能叠加graphics包中的函数。
plot(predict.2)
ggplot(predict.2)
ggplot(predict.2) +
geom_point(data = mtcars, aes(wt, mpg))
效果如下:
model.3 <- ols(mpg ~ pol(wt,2), data = mtcars)
predict.3 <- Predict(model.3, wt = seq(1,6,0.5))
predict.3
ggplot(predict.3) +
geom_point(data = mtcars, aes(wt, mpg))
ddist <- datadist(mtcars)
options(datadist = "ddist")
model.4 <- ols(mpg ~ wt + drat, data = mtcars)
predict.41 <- Predict(model.4, np = 10)
plot(predict.41)
## 使用单个变量进行预测:
predict.42 <- Predict(model.4, wt = seq(1,6,0.5))
plot(predict.42)
# 未参与预测的变量实际上取的是中位数:
median(mtcars$drat)
## [1] 3.695
# 也可以指定未参与预测变量的取值:
predict.43 <- Predict(model.4, wt = seq(1,6,0.5),
drat = 4,
np = 10)
plot(predict.43)
rms包中的模型函数基本上是自成体系的,如logistic回归用lrm()函数,以及一系列的样条曲线函数,如rcs()函数(linear tail-restricted cubic spline function)。
ddist <- datadist(mtcars)
options(datadist = "ddist")
model.5 <- ols(mpg ~ rcs(wt) + drat, data = mtcars)
predict.5 <- Predict(model.5, wt)
plot(predict.5)
04 ROC曲线95%置信区间
library(pROC)
data(aSAH)
rocobj <- plot.roc(aSAH$outcome, aSAH$s100b,
main="Confidence intervals", percent=TRUE,
ci=TRUE, # compute AUC (of AUC by default)
print.auc=TRUE) # print the AUC (will contain the CI)
ciobj <- ci.se(rocobj, # CI of sensitivity
specificities=seq(0, 100, 5)) # over a select set of specificities
plot(ciobj, type="shape", col="#1c61b6AA") # plot as a blue shape
plot(ci(rocobj, of="thresholds", thresholds="best")) # add one threshold
library(pROC)
library(ROCR)
rocobj <- roc(aSAH$outcome, aSAH$s100b,auc = TRUE,
ci=TRUE, # compute AUC (of AUC by default)
print.auc=TRUE) # print the AUC (will contain the CI)
ciobj <- ci.se(rocobj, # CI of sensitivity
specificities=seq(0, 1, 0.01)) # over a select set of specificities
auc<-auc(rocobj)[1]
auc_low<-ci(rocobj,of="auc")[1]
auc_high<-ci(rocobj,of="auc")[3]
auc_full<-paste("AUC:",round(auc,digits = 3),"(",
round(auc_low,digits = 3),",",round(auc_high,digits = 3),")",sep = "")
data_ci<-ciobj[1:101,1:3]
data_ci<-as.data.frame(data_ci)
library(ggplot2)
ggroc(rocobj,color="red",size=1)+theme_bw()+
geom_segment(aes(x = 1, y = 0, xend = 0, yend = 1),
colour='grey', linetype = 'dotdash') +
geom_ribbon(data = data_ci,aes(x=as.numeric(rownames(data_ci)),ymin=`2.5%`,ymax=`97.5%`), fill = 'lightblue',alpha=0.5)+
# geom_ribbon(data = data_ci,aes(x=x,ymin=`2.5%`,ymax=`97.5%`), fill = 'lightblue',alpha=0.5)+
theme(plot.title = element_text(hjust = 0.5),
legend.justification=c(1, 0), legend.position=c(.95, .05),
#legend.title=title,
legend.background = element_rect(fill=NULL, size=0.5,
linetype="solid", colour ="black"))+
labs(x="Specificity",y="Sensitivity")
参考资料
(1条消息) rms | 如何绘制模型带置信区间的预测曲线_R语言学堂的博客-CSDN博客
https://link.springer.com/content/pdf/bfm%3A978-3-319-19425-7%2F1.pdf
绘制ROC曲线95%置信区间 - 灰信网(软件开发博客聚合) (freesion.com)
关注R小盐,关注科研私家菜(VX_GZH: SciPrivate),有问题请联系R小盐。让我们一起来学习 R语言机器学习与临床预测模型
网友评论