仅仅一个函数,就可以完成差异表达基因的GO,GSEA,KEGG 分析与可视化,下面来看一看
01 需要的R包
library(AnnotationDbi)
library(org.Hs.eg.db) 基因注释需要
library(clusterProfiler)
library(stringr)
library(enrichplot)#画图需要
library(stats)
library(dplyr)
library(msigdbr)
02 定义R函数
首先定义一个R函数,目的是:读入数据;进行富集分析。
#定义R函数
allEnrich <- function(df) {
df = read.csv(df, stringsAsFactors = F,
header = T)
names(df) = c("gene", "avg_logFC")
#msigdbr 包提供多个物种的MSigDB数据
b = msigdbr(species = "Homo sapiens", category = "C2") %>%
select(gs_name, entrez_gene)
a1 = df$avg_logFC
a2 = easyConvert(
species = "HUMAN",
queryList = df$gene,
queryType = "SYMBOL"
)
names(a1) = a2$ENTREZID
b1 = a1[order(a1, decreasing = T)]
GSEA分析
gsearesults = GSEA(b1, TERM2GENE = b)
#进行CC分析
ego1 <- enrichGO(
gene = df$gene,
OrgDb = org.Hs.eg.db,
keyType = 'SYMBOL',
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05
)
#绘图
p1 = dotplot(ego1, showCategory = 15)
ggplot2::ggsave("CC.pdf", p1, width = 7, height = 8)
#输出结果,保存到本地
write.csv(ego1@result, "Cell_component.csv")
=========================================
#进行BP分析
ego2 <- enrichGO(
gene = df$gene,
OrgDb = org.Hs.eg.db,
keyType = 'SYMBOL',
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05
)
#绘图
p2 = dotplot(ego2, showCategory = 15)
ggplot2::ggsave("BP.pdf", p2, width = 7, height = 8)
#输出结果,保存到本地
write.csv(ego2@result, "Biological_process.csv")
=======================================
#进行MF分析
ego3 <- enrichGO(
gene = df$gene,
OrgDb = org.Hs.eg.db,
keyType = 'SYMBOL',
ont = "MF",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05
)
#绘图
p3 = dotplot(ego3, showCategory = 15)
ggplot2::ggsave("MF.pdf", p3, width = 7, height = 8)
#输出结果,保存到本地
write.csv(ego3@result, "Molecular_function.csv")
=========================================
GSEA可视化
p4 = gsearesults@result %>%
mutate(Description = gsub("_", " ", Description)) %>%
mutate(Description = reorder(Description, setSize)) %>% head(15) %>%
ggplot(aes(
x = enrichmentScore,
y = Description,
size = setSize,
color = pvalue
)) +
geom_point() +
scale_size(range = c(1, 5), name = "Size") +
scale_color_gradient(low = "blue", high = "red") +
xlab("GeneRatio") + ylab("") +
theme(axis.text.y = element_text(size = 10, color = "black"))
ggplot2::ggsave("GSEA.pdf", p4, width = 7, height = 8)
write.csv(gsearesults@result, "GSEA_Results.csv")
==================================
KEGG分析
ego4 = enrichKEGG(
gene = names(a1),
organism = "hsa",
keyType = "kegg",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
minGSSize = 10,
maxGSSize = 500,
qvalueCutoff = 0.2,
use_internal_data = FALSE
)
p5 = dotplot(ego4, showCategory = 15)
ggplot2::ggsave("KEGG.pdf", p5, width = 7, height = 8)
write.csv(ego4@result, "KEGG_pathway.csv")
cat("Successfully~")
}
03 读入数据分析
使用定义好的R函数:allEnrich,读入数据进行分析,数据格式必须为2列:
第一列 geneSymbol
第二类 Log2FoldChange
#给出包含geneSymbol,Log2FoldChange所在的csv文件
allEnrich("significant_degs.csv")
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
wrong orderBy parameter; set to default `orderBy = "x"`
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
wrong orderBy parameter; set to default `orderBy = "x"`
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
wrong orderBy parameter; set to default `orderBy = "x"`
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
Reading KEGG annotation online:
Reading KEGG annotation online:
wrong orderBy parameter; set to default `orderBy = "x"`
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
Successfully~
成功完成分析后,这是富集分析输出的所有结果:
image.png
查看结果文件:
image.png
网友评论