前言
之前我们介绍了maSigpro的基本建模方式:分析时间序列的RNA-seq tool,这一次我们从代码的角度来分析其具体的建模原理
我们利用自带的数据集做运算
library(maSigPro) # load maSigPro library
data(data.abiotic)
data(edesign.abiotic)
edesign = edesign.abiotic
data = data.abiotic
design <- make.design.matrix(edesign, degree = 2)
fit <- p.vector(data, design, Q = 0.05, MT.adjust = "BH", min.obs = 20)
tstep <- T.fit(fit, step.method = "backward", alfa = 0.05)
我们将重点从这几段代码入手来解析其建模原理
make.design.matrix
那么第一步是将我们的设计矩阵做转换,首先我们的设计矩阵为这个样子:
至于该函数有一下几个参数:
"make.design.matrix" <-
function (edesign, degree = 2, time.col = 1,
repl.col = 2, group.cols = c(3:ncol(edesign)))
当满足if (dim(as.matrix(edesign))[2] > 3)
执行:
if (dim(as.matrix(edesign))[2] > 3) {
dummy.cols <- group.cols[2:length(group.cols)]
treatm.label <- paste(colnames(edesign)[dummy.cols],
"vs", control.label, sep = "")
groups.label <- c(control.label, treatm.label)
matrix.dummy <- as.matrix(edesign[, dummy.cols])
## Shared origin
dummy <- NULL
j = 0
origen <- min(edesign[,time.col])
origen <-edesign[edesign[,1]==origen,]
for (i in 1:length(dummy.cols)) {
share <- apply(origen[,c(3,dummy.cols[i])],1,sum)
if (!is.element(TRUE, share>1)) {
j=j+1
dummy <- cbind(dummy,matrix.dummy[,i])
colnames(dummy)[j] <- treatm.label[i]
}
}
time <- as.matrix(edesign[, time.col])
colnames(time) <- colnames(edesign)[time.col]
dis <- cbind(dummy, time)
rownames(dis) <- rownames(edesign)
groups.vector <- c(colnames(dummy), control.label)
colnames.dis <- colnames(dis)
dis <- cbind(dis, dis[,ncol(dis)] * matrix.dummy)
colnames(dis) <- c(colnames.dis, paste(colnames(edesign)[time.col],
"x", colnames(edesign)[dummy.cols], sep = ""))
groups.vector <- c(groups.vector, treatm.label)
if (degree >= 2) {
for (i in 2:degree) {
colnames.dis <- colnames(dis)
dis <- cbind(dis, edesign[, time.col]^i, edesign[,
time.col]^i * edesign[, dummy.cols])
colnames(dis) <- c(colnames.dis, paste(colnames(edesign)[time.col],
i, sep = ""), paste(colnames(edesign)[time.col],
"", i, "x", colnames(edesign)[dummy.cols],
sep = ""))
groups.vector <- c(groups.vector, groups.label)
}
}
}
满足这个条件意味着你的设计矩阵的列数要大于3,如果满足该条件则:
#group.cols是将control和control后面的列取出来
group.cols = c(3:ncol(edesign))
control.label <- colnames(edesign)[group.cols][1]
#指代设计矩阵第一列是时间
time.col = 1
dummy.cols <- group.cols[2:length(group.cols)]
#设计列名,即不同处理条件和control比较
treatm.label <- paste(colnames(edesign)[dummy.cols],
"vs", control.label, sep = "")
groups.label <- c(control.label, treatm.label)
matrix.dummy <- as.matrix(edesign[, dummy.cols])
##初始化
dummy <- NULL
j = 0
#取出最小的时间序列,该例子中是最小的时间是3
origen <- min(edesign[,time.col])
origen <-edesign[edesign[,1]==origen,]
origen
接下来就需要对dummy循环赋值了:
for (i in 1:length(dummy.cols)) {
share <- apply(origen[,c(3,dummy.cols[i])],1,sum)
if (!is.element(TRUE, share>1)) {
j=j+1
dummy <- cbind(dummy,matrix.dummy[,i])
colnames(dummy)[j] <- treatm.label[i]
}
}
此时设计出来的矩阵为:
通过该矩阵的形式我们可以看到所有的处理组(Cold,Heat,Salt)都是与Control组相比,比方说ColdvsControl则对应于列名含有Cold的组别,并用 1 来区别
若degree=1
#提取设计矩阵含有时间的那一列
time <- as.matrix(edesign[, time.col])
colnames(time) <- colnames(edesign)[time.col]
#合并dummy和time
dis <- cbind(dummy, time)
rownames(dis) <- rownames(edesign)
groups.vector <- c(colnames(dummy), control.label)
colnames.dis <- colnames(dis)
#dis[,ncol(dis)]实际上是提取time这一列
dis <- cbind(dis, dis[,ncol(dis)] * matrix.dummy)
colnames(dis) <- c(colnames.dis, paste(colnames(edesign)[time.col],
"x", colnames(edesign)[dummy.cols], sep = ""))
groups.vector <- c(groups.vector, treatm.label)
dis[,ncol(dis)]
matrix.dummy
matrix.dummy是设计矩阵的后三列,代表不同的处理
dis矩阵(degree=1)
dis矩阵有一个特点,比方说Control_3H因为没有任何处理所以ColdvsControl,HeatvsControl和SaltvsControl均为0,但是该样本的时间点为3,所以time那一列为3,因为没有任何处理(Cold,Heat和Salt),所以后面的(Time×Cold,Time×Heat,Time×Salt)均为0;
又例如Cold_9H,只有Cold处理,所以ColdvsControl这一项为1,其他(HeatvsControl和SaltvsControl)为0,但是该样本的时间点为9,所以time那一列为9,因为只有Cold处理,所以后面的Time×Cold为9,其他(Time×Heat,Time×Salt)为0
若degree≥2
接下来就要进行degree的判断了,如果这里我们假设degree=3,这里的degree代表的是多项式回归的最高次数,我们前面简单介绍了dis矩阵的构造,下面
degree = 3
if (degree >= 2) {
for (i in 2:degree) {
colnames.dis <- colnames(dis)
dis <- cbind(dis, edesign[, time.col]^i,
edesign[, time.col]^i * edesign[,dummy.cols])
colnames(dis) <- c(colnames.dis, paste(colnames(edesign)[time.col], i, sep = ""),
paste(colnames(edesign)[time.col], "", i, "x", colnames(edesign)[dummy.cols], sep = ""))
groups.vector <- c(groups.vector, groups.label)
}
}
dis矩阵(degree=3)左半边
dis矩阵(degree=3)右半边
我们可以看到,degree=3的dis矩阵和degree=1的dis矩阵相比多了Time2和Time3,例如Control_3H的Time2是取了2次方的,即3^2=9,Time3是去了3次方的3^3=27;
Cold_9H的Time2是取了2次方的,即9^2=81,Time3是去了3次方的9^3=729,其他的解释与degree=1的dis矩阵一样
这么做的目的是引入与时间相关的高次多项式,更好的拟合模型
网友评论