之前的推文中多次提到过autoReg这个包,用来实现一键单多因素回归分析,同时可以导出回归分析的结果为word或ppt格式,只需要简单修改下格式就不用再进行任何操作,可以说非常方便。
另外,之前也有关于回归分析结果可视化的推送,相信图要比表更加直观。今天我们继续温习下这两个重要的技能吧。
首先创建我们需要的数据,基于colon数据集做简单修改。
library(ggplot2)
library(tidyverse)
library(patchwork)
library(autoReg)
data(cancer,package="survival")
colon%>%
select("status","rx","sex","age","obstruct","perfor","adhere","differ","surg")->data
colnames(data)<-c("Status","Tretment","Sex","Age","Obstruction","Perforation","Adherence","Differentiation","Surgery")
data[,c(3,5:9)]<-lapply(data[,c(3,5:9)],as.factor)
data%>%
mutate(Age=ifelse(Age>=60,">=60","<60"))->data
data$Age<-as.factor(data$Age)
str(data)
# 'data.frame': 1858 obs. of 9 variables:
# $ Status : num 1 1 0 0 1 1 1 1 1 1 ...
# $ Tretment : Factor w/ 3 levels "Obs","Lev","Lev+5FU": 3 3 3 3 1 1 3 3 1 1 ...
# $ Sex : Factor w/ 2 levels "0","1": 2 2 2 2 1 1 1 1 2 2 ...
# $ Age : Factor w/ 2 levels "<60",">=60": 1 1 2 2 2 2 2 2 2 2 ...
# $ Obstruction : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 1 1 ...
# $ Perforation : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
# $ Adherence : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 1 1 1 1 ...
# $ Differentiation: Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 2 2 2 ...
# $ Surgery : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 2 2 ...
然后我们可以拟合一个逻辑回归模型
fit=glm(Status~.,data=data,family="binomial")
一般到这儿,我们可以summary模型获得模型的结果,如每个变量的回归系数以及P值。或者我们可以通过broom包的tidy()函数得到一个整洁的数据框格式的结果。然后通过exp()函数提取每个变量的OR值并计算置信区间。
summary(fit)
# Call:
# glm(formula = Status ~ ., family = "binomial", data = data)
#
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -1.791 -1.139 -0.890 1.141 1.525
#
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.10539 0.18266 -0.577 0.56396
# TretmentLev -0.05094 0.11661 -0.437 0.66222
# TretmentLev+5FU -0.59845 0.11799 -5.072 3.94e-07 ***
# Sex1 -0.08420 0.09641 -0.873 0.38250
# Age>=60 0.05137 0.09712 0.529 0.59682
# Obstruction1 0.21080 0.12244 1.722 0.08513 .
# Perforation1 0.21732 0.28921 0.751 0.45240
# Adherence1 0.37729 0.13981 2.699 0.00697 **
# Differentiation2 0.06623 0.15997 0.414 0.67887
# Differentiation3 0.49570 0.19201 2.582 0.00983 **
# Surgery1 0.31980 0.10830 2.953 0.00315 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 2511.9 on 1811 degrees of freedom
# Residual deviance: 2443.9 on 1801 degrees of freedom
# (因为不存在,46个观察量被删除了)
# AIC: 2465.9
#
# Number of Fisher Scoring iterations: 4
fit%>%tidy()
# A tibble: 11 x 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) -0.105 0.183 -0.577 0.564
# 2 TretmentLev -0.0509 0.117 -0.437 0.662
# 3 TretmentLev+5FU -0.598 0.118 -5.07 0.000000394
# 4 Sex1 -0.0842 0.0964 -0.873 0.382
# 5 Age>=60 0.0514 0.0971 0.529 0.597
# 6 Obstruction1 0.211 0.122 1.72 0.0851
# 7 Perforation1 0.217 0.289 0.751 0.452
# 8 Adherence1 0.377 0.140 2.70 0.00697
# 9 Differentiation2 0.0662 0.160 0.414 0.679
# 10 Differentiation3 0.496 0.192 2.58 0.00983
# 11 Surgery1 0.320 0.108 2.95 0.00315
然而,自从有了autoReg,我们直接可以省略这些步骤,直接一键获得最终的回归分析结果的表格。
autoReg(fit)%>%myft()
![](https://img.haomeiwen.com/i29223694/9dfeb9b4baab6b2d.png)
紧接着我们可以通过rrtable包的table2docx()函数导出结果,这里不再演示。另外,这里是多因素回归分析的结果,我们也可以通过设置参数uni=TRUE获得单多因素回归分析的结果。
tab1 <- CreateTableOne(vars =covars, strata = "BMI" ,
data = mbc, factorVars=fvars,addOverall = F )
print(tab1,smd = TRUE)
下面我们继续使用ggplot2来把回归分析的结果可视化出来。
首先我们需要提取森林图的输入文件,即回归分析结果中的变量,OR及置信区间和P值。
or_ci <- exp(confint(fit))
p_values <-summary(fit)$coefficients[, 4]
forest_data <- data.frame(
Variables =rownames(or_ci)[-1],#提取变量名
OR =exp(coef(fit))[-1],#OR值
Lower_CI = or_ci[-1, 1],#95%CI下限
Upper_CI = or_ci[-1, 2],#95%CI上限
P_Value = p_values[-1] #P值
)
然后我们变量名因子化并且根据P值和OR的结果新产生一个变量,即这些变量是结局事件的危险因素、保护因素还是没有意义,方便后面的映射。
forest_data$Variables<-factor(forest_data$Variables,levels = rev(forest_data$Variables))#变量名因子化
#根据P值和OR值定义变量是危险因素还是保护性因素还是没有意义
forest_data%>%
mutate(Change=case_when(
P_Value<0.05& OR>1~"Risk factor",
P_Value<0.05& OR<1~"Protective factor",
P_Value>=0.05~"Not sig"
))->forest_data
现在开始画图
p<-ggplot(forest_data, aes(x =OR, y = Variables)) +#映射
geom_errorbarh(aes(xmin = Lower_CI, xmax = Upper_CI,color=Change), height = 0.2,size=1) +#置信区间
geom_point(alpha=0.3,aes(size=P_Value)) +#OR值,按照P值映射
xlim(min(or_ci), max(or_ci)) +#横坐标轴范围
labs(x = "OR (95%CI)", y = "") +#坐标轴标题
geom_vline(xintercept = 1.0,color="grey",linetype=2,linewidth=1)+#添加无效线
scale_color_manual(values = c("#377eb8","#4daf4a","#e41a1c"))+#自定义颜色
theme_bw()+#设置主题
theme(axis.text = element_text(size = 12,color = "black"),
panel.border = element_rect(linewidth = 1))
p
![](https://img.haomeiwen.com/i29223694/7501b455a60ca771.png)
这就是第一张图,其实很简单,以OR值绘制点图,使用置信区间上下限添加误差棒。在此基础上按照P值映射OR值对应点图的大小,并且根据P值和OR值对不同的变量的误差棒重新设定颜色。
在此基础上,我们可以进一步按照变量的意义,让左边变量文本的颜色和图中误差棒的颜色一致,并且对主题进行简单修改。
color_mapping <- c("TretmentLev" ="#377eb8",
"TretmentLev+5FU" ="#4daf4a",
"Sex1"="#377eb8",
"Age>=60"="#377eb8",
"Obstruction1"="#377eb8",
"Perforation1"="#377eb8",
"Adherence1"="#e41a1c",
"Differentiation2"="#377eb8",
"Differentiation3"="#e41a1c",
"Surgery1"="#e41a1c")
p1<-p+theme(axis.text.y = element_text(color = rev(color_mapping)),
panel.grid = element_blank())
p1
![](https://img.haomeiwen.com/i29223694/428c1029c592476c.png)
最后,我们用patchwork组图。
patch<-p+p1+
plot_annotation(
title ="Visualize regression results using ggplot2",
tag_levels = "A",
theme = theme(plot.title =element_text(size = 16))
)
patch
![](https://img.haomeiwen.com/i29223694/a7e2cf318e5f5312.png)
今天的分享就到这里了,原文来自本人微信公众号原创文章!
网友评论