Gene set enrichment analysis with topGO
Alexa A, Rahnenfuhrer J (2019). topGO: Enrichment Analysis for Gene Ontology. R package version 2.36.0.
1. 关于 topGO
& 安装三连
topGO
的设计旨在实现 GO term 的半自动富集分析。一个主要优点是它提供的统一化的基因集检测方案 (the unified gene set testing framework) 。接下来是对这个主要优点的展开,Besides, also 句型——轻松地应用函数完成GO富集分析、容易地执行新的统计学检测或新算法、对比不同的GO富集研究方法。
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("topGO")
library(topGO)
ls("package:topGO")
# [1] "algorithm" "algorithm<-"
# [3] "allGenes" "allMembers"
# [5] "allMembers<-" "allParents"
# [7] "allScore" "annFUN"
# [9] "annFUN.db" "annFUN.file"
# [11] "annFUN.gene2GO" "annFUN.GO2genes"
# [13] "annFUN.org" "attrInTerm"
# [15] "buildLevels" "combineResults"
# [17] "contTable" "countGenesInTerm"
# [19] "cutOff" "cutOff<-"
# [21] "depth" "depth<-"
# [23] "description" "description<-"
# [25] "elim" "elim<-"
# [27] "expressionMatrix" "feasible"
# [29] "feasible<-" "geneData"
# [31] "geneData<-" "genes"
# [33] "geneScore" "geneSelectionFun"
# [35] "geneSelectionFun<-" "genesInTerm"
# [37] "GenTable" "getGraphRoot"
# [39] "getNoOfLevels" "getPvalues"
# [41] "getSigGroups" "getSigRatio"
# [43] "GOBPTerm" "GOCCTerm"
# [45] "GOFisherTest" "GOglobalTest"
# [47] "GOKSTest" "GOKSTiesTest"
# [49] "GOMFTerm" "GOplot"
# [51] "GOSumTest" "GOtTest"
# [53] "graph" "graph<-"
# [55] "groupGOTerms" "inducedGraph"
# [57] "initialize" "inverseList"
# [59] "joinFun" "members"
# [61] "members<-" "membersExpr"
# [63] "membersScore" "Name"
# [65] "Name<-" "nodesInInducedGraph"
# [67] "numAllMembers" "numGenes"
# [69] "numMembers" "numSigAll"
# [71] "numSigGenes" "numSigMembers"
# [73] "ontology" "ontology<-"
# [75] "penalise" "permSumStats"
# [77] "permSumStats.all" "phenotype"
# [79] "print" "printGenes"
# [81] "printGraph" "pType"
# [83] "pType<-" "rankMembers"
# [85] "readMappings" "reverseArch"
# [87] "runTest" "score"
# [89] "score<-" "scoreOrder"
# [91] "scoresInTerm" "show"
# [93] "showGroupDensity" "showSigOfNodes"
# [95] "sigAllMembers" "sigGenes"
# [97] "sigMembers" "sigMembers<-"
# [99] "sigRatio<-" "termStat"
# [101] "testName" "testName<-"
# [103] "testStatistic" "testStatPar"
# [105] "updateGenes" "updateGroup"
# [107] "updateTerm<-" "usedGO"
# [109] "Weights" "Weights<-"
# [111] "whichAlgorithms" "whichTests"
2. 数据准备
调用 ALL
包的内置数据。
library(ALL)
data("ALL")
data(geneList) ## geneList, a list of genes and the respective p-values for differential expression
affyLib <- paste(annotation(ALL), "db", sep = ".")
library(package = affyLib, character.only = TRUE)
# 载入需要的程辑包:org.Hs.eg.db
## 载入多个包时,加上 character.only = TRUE 参数,可以确保正确载入
函数 topDiffGenes()
可以在 p-values=0.01 的显著水平上筛选 geneList
中的差异表达基因。
sum(topDiffGenes(geneList))
# [1] 50
万事俱备,开始 构建 topGOdata
,这个对象包含全部的基因指标 (gene
identifiers) 和其数值、GO注释、GO 层次结构 (GO hierarchical structure) 以及其他用于富集分析的信息。
sampleGOdata <- new("topGOdata",
description = "Simple session",
ontology = "BP",
allGenes = geneList,
geneSel = topDiffGenes, ## 设定筛选函数
nodeSize = 10, ## 删去少于10个注释基因的GO term
annot = annFUN.db, ## maps genes identifiers to GO terms from the affyLib object
affyLib = affyLib)
sampleGOdata
#
# ------------------------- topGOdata object -------------------------
#
# Description:
# - Simple session
#
# Ontology:
# - BP
#
# 323 available genes (all genes from the array):
# - symbol: 1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at ...
# - score : 1 1 0.62238 0.541224 1 ...
# - 50 significant genes.
#
# 309 feasible genes (genes that can be used in the analysis):
# - symbol: 1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at ...
# - score : 1 1 0.62238 0.541224 1 ...
# - 45 significant genes.
#
# GO graph (nodes with at least 10 genes):
# - a graph with directed edges
# - number of nodes = 1072
# - number of edges = 2319
#
# ------------------------- topGOdata object -------------------------
3. 富集检测
- 用到的两种统计学方法:Fisher's exact test, Kolmogorov-Smirnov like test.
- 用到的函数:
runTest()
,可将特定算法和T检验方法应用于输入的对象,并返回一个topGOresult
对象。
3.1 经典 over-representation GO富集分析
resultFisher <- runTest(sampleGOdata, algorithm = "classic", statistic = "fisher")
#
# -- Classic Algorithm --
#
# the algorithm is scoring 964 nontrivial nodes
# parameters:
# test statistic: fisher
resultFisher
#
# Description: Simple session
# Ontology: BP
# 'classic' algorithm with the 'fisher' test
# 1072 GO terms scored: 45 terms with p < 0.01
# Annotation data:
# Annotated genes: 309
# Significant genes: 45
# Min. no. of genes annotated to a GO: 10
# Nontrivial nodes: 964
class(resultFisher)
# [1] "topGOresult"
# attr(,"package")
# [1] "topGO"
3.2 利用 Kolmogorov-Smirnov like test 的富集分析
两种方法:classic, elim
'elim' : this method processes the GO terms by traversing the GO hierarchy from bottom to top.
'classic': each GO term is tested independently, not taking the GO hierarchy into account.
resultKS.classic <- runTest(sampleGOdata, algorithm = "classic", statistic = "ks")
#
# -- Classic Algorithm --
#
# the algorithm is scoring 1072 nontrivial nodes
# parameters:
# test statistic: ks
# score order: increasing
resultKS.classic
#
# Description: Simple session
# Ontology: BP
# 'classic' algorithm with the 'ks' test
# 1072 GO terms scored: 83 terms with p < 0.01
# Annotation data:
# Annotated genes: 309
# Significant genes: 45
# Min. no. of genes annotated to a GO: 10
# Nontrivial nodes: 1072
resultKS.elim <- runTest(sampleGOdata, algorithm = "elim", statistic = "ks")
#
# -- Elim Algorithm --
#
# the algorithm is scoring 1072 nontrivial nodes
# parameters:
# test statistic: ks
# cutOff: 0.01
# score order: increasing
resultKS.elim
#
# Description: Simple session
# Ontology: BP
# 'elim' algorithm with the 'ks : 0.01' test
# 1072 GO terms scored: 24 terms with p < 0.01
# Annotation data:
# Annotated genes: 309
# Significant genes: 45
# Min. no. of genes annotated to a GO: 10
# Nontrivial nodes: 1072
ps. 查看所有允许的算法和T检验方法
whichAlgorithms()
# [1] "classic" "elim" "weight" "weight01"
# [5] "lea" "parentchild"
whichTests()
# [1] "fisher" "ks" "t" "globaltest"
# [5] "sum" "ks.ties"
4. 结果分析
4.1 得到富集显著 GO term 的 data frame
函数 GenTable()
可用于分析富集最为显著的 GO term 和相对应的p值。
allRes <- GenTable(sampleGOdata,classicFisher = resultFisher,
classicKS = resultKS.classic,
elimKS = resultKS.elim,
orderBy = "elimKS",ranksOf = "classicFisher",
topNodes = 10)
4.2 不同算法所得p值的对比及可视化
函数 score()
可以得到 topGOresult
对象中 GO term 的p值。
pValue.classic <- score(resultKS.classic)
pValue.elim <- score(resultKS.elim)[names(pValue.classic)]
gstat <- termStat(sampleGOdata, names(pValue.classic)) ## to summarise the statistics
colMap <- function(x) {
.col <- rep(rev(heat.colors(length(unique(x)))), time = table(x))
return(.col[match(1:length(x), order(x))])
}
gCol <- colMap(gstat$Significant)
plot(pValue.classic, pValue.elim,
xlab = "p-value classic",
ylab = "p-value elim",
pch = 19, cex = gSize, col = gCol)
plot(log10(pValue.classic), log10(pValue.elim),
xlab = "lg(p-value) classic",
ylab = "lg(p-value) elim",
pch = 19, cex = gSize, col = gCol)
可以看出两种方法得到的p值确实是有差异的,所以显然存在一些被A方法认定显著——而在B方法下不显著的 GO term. 可以通过下面的方法找到它们:
sel.go <- names(pValue.classic)[pValue.elim < pValue.classic]
cbind(termStat(sampleGOdata, sel.go),
elim = pValue.elim[sel.go],
classic = pValue.classic[sel.go])
# Annotated Significant Expected elim classic
# GO:0006807 224 36 32.62 0.44604086 0.53079040
# GO:0019538 164 23 23.88 0.37042531 0.47406305
# GO:0043170 206 35 30.00 0.05327402 0.06576399
# GO:1901564 189 26 27.52 0.73230230 0.90604207
4.3 GO Graph
方形为显著性前五的 GO term.
showSigOfNodes(sampleGOdata, score(resultKS.elim),
firstSigNodes = 5, useInfo = 'all')
5. 其他的可视化技能
5.1 p值的直方图
hist(pValue.classic, 50, xlab = "p-values_KSclassic")
5.2 分析单个GO term
函数 showGroupDensity()
可以解释显著富集到某个GO term的基因是否比其他基因有更高的分值。
goID <- allRes[1, "GO.ID"]
showGroupDensity(sampleGOdata, goID, ranks = TRUE)
References
-
Gene set enrichment analysis with topGO
https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf
-
avrilomics Using TopGO to test for GO-term enrichment
http://avrilomics.blogspot.com/2015/07/using-topgo-to-test-for-go-term.html
-
Quick-R https://www.statmethods.net/advgraphs/parameters.html
-
davetgerrard LiverProteins/scriptx/topGO_vignetteTrial.R
https://github.com/davetgerrard/LiverProteins/blob/master/scripts/topGO_vignetteTrial.R
悄咪咪吐槽一下作者
最后,向大家隆重推荐生信技能树的一系列干货!
- 生信技能树全球公益巡讲:https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g
- B站公益74小时生信工程师教学视频合辑:https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw
- 招学徒:https://mp.weixin.qq.com/s/KgbilzXnFjbKKunuw7NVfw
网友评论