在做limma分析过程中,又出现了报错信息
Error in makeContrasts(contrasts = c(pccp_comp), levels = pccpdesign) :
The levels must by syntactically valid names in R, see help(make.names). Non-valid names: factor(PCCPGrouplist)CP,factor(PCCPGrouplist)PC
查了对应的help(make.names)
character vector to be coerced to syntactically valid names. This is coerced to character if necessary.
colnames(design)
#[1] "factor(grouplist)CP"
#[2] "factor(grouplist)PC"
design
原因是我的名字带有了(),不符合R语法-所以不是有效的字符串???
那就把列名修改下,真的很奇怪,搞不太懂为啥它自动起名后就不符合语法要求了
colnames(design) <- c("CP","PC")
改完后运行是可以的,全部代码如下
## 1.创建设计矩阵和对比
grouplist <- c(rep("PC", times = 5),rep("CP", times = 4))
group_list
library(limma)
design <- model.matrix(~0+factor(grouplist))
comp <- 'PC-CP'
cont.matrix <- makeContrasts(contrasts=c(comp),levels = design)
colnames(design) <- c("CP","PC")
## 2.构建edgeR的DGEList对象,并归一化,拟合模型
dge <- DGEList(counts=PCVSCP_exprSet)
dge <- calcNormFactors(dge)
v <- voom(dge,design,plot=TRUE, normalize="quantile")
fit <- lmFit(v, design)
fit2 <- contrasts.fit(fit,cont.matrix)
fit2 <- eBayes(fit2)
tmp <- topTable(fit2, coef=comp, n=Inf,adjust.method="BH")
PCCP_DEG_limma_voom <- na.omit(tmp)
PCCP_DEG_limma_voom
A number of summary statistics are presented by topTable() for the top genes and the selected contrast.
得到最后的数据有6列
- The logFC column gives the value of the contrast. Usually this represents a log2-fold change between two or more experimental conditions although sometimes it represents a log2-expression level.
- The AveExpr column gives the average log2-expression level for that gene across all the arrays and channels in the experiment.
- Column t is the moderated t-statistic.
- Column P.Value is the associated p-value
- adj.P.Value is the p-value adjusted for multiple testing. The most popular form of adjustment is "BH" which is Benjamini and Hochberg's method to control the false discovery rate [1]. The adjusted values are often called q-values if the intention is to control or estimate the false discovery rate. The meaning of "BH" q-values is as follows. If all genes with q-value below a threshold, say 0.05, are selected as differentially expressed, then the expected proportion of false discoveries in the selected group is controlled to be less than the threshold value, in this case 5%. This procedure is equivalent to the procedure of Benjamini and Hochberg although the original paper did not formulate the method in terms of adjusted p-values.
- The B-statistic (lods or B) is the log-odds that the gene is differentially expressed [41, Section 5].Suppose for example that B = 1:5. The odds of differential expression is exp(1.5)=4.48, i.e, about four and a half to one. The probability that the gene is differentially expressed is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this gene is differentially expressed
网友评论