小麦的GO富集分析

作者: e3617f9991e1 | 来源:发表于2019-05-06 20:16 被阅读68次

小麦的GO富集分析
【R画图学习3.1】富集气泡图
miRNA功能富集分析
2020-12-17水稻、小麦GO富集分析
GSEA(Gene Set Enrichment Analysi
R语言学习之GO富集分析结果可视化（柱状图、网络图）
【R语言】dplyr对数据分组取各组前几行
GO富集分析可视化：R语言GOplot包——准备自己的数据
GO富集分析简单介绍
(日常记录)GSEA富集分析软件的使用

小麦的GO富集分析，我使用的是（clusterProfiler）,先安装clusterprofiler，我的R版本为3.5.2。此次实践参考了https://zhuanlan.zhihu.com/p/43651419 如果大家觉得我写的不够好，还可以戳原文看看

source("https://bioconductor.org/biocLite.R")
biocLite("clusterProfiler")

需要准备文件3个：
第一个文件：所要富集基因的列表,上一步找出来的差异基因。

genelist.png

第二个文件：基因与GO号的对应关系，一定是GO号在前，基因在后。
这个文件来自于ensemble plants（具体怎么做会贴出图介绍）。导出的文件是基因在前，GO号在后，利用python 脚本将只有基因没有对应的GO号的行去除，再调换两列的位置。

term2gene.png

第三个文件是GO号与其对应注释的文件，即GO号和GO name,这个文件也是来自于ensemble plants biomaRt.

term2name.png

准备好这三个文件之后就可以做富集分析了。
R代码：就不写注释了（lazy!）

setwd("C:/U/work/directory/")
library(clusterProfiler)
gene <-read.table("diffgene.txt",header = T)
gene <- as.factor(gene$geneNames)
term2gene <- read.table("term2gene.txt",header = F,quote = "")
term2name <- read.table("GOname.txt",header = T,quote = "",sep = "\t")
x <- enricher(gene,TERM2GENE=term2gene,TERM2NAME=term2name)
out_file <- paste("enricher.out.txt",sep ="\t")
write.csv(x,out_file)
barplot(x,showCategory=10,includeAll=TRUE)
dotplot(x,showCategory=10)

然后把做好的图放上来,不是很好看。

barplot.png

dotplot.png

讲一讲如何获得那两个准备文件，

ensembleplants.png

点击biomart之后,选择物种，选择要输出的内容，基因名和GO accession,再点击results 和Go

spicies.png

term2gene.png

analysis.png

最后得到一个文本文件，下载下来.内容：

gene_term.png
获得第三个文件的做法相类似。选择输出内容的时候不一样。

然后一点点简单的python,把刚刚下载下来的文件转化成需要的格式。

def trans():
    file1 = open(r"C:\Users\desktop\mart_export.txt","r")
    file2 = open(r"C:\Users\desktop\result.txt","w")

    file1.readline()
    for line in file1:
        line = line.strip() 
        num = len(line.split("\t"))
        if num == 2:
            gene = line.split("\t")[0]
            term = line.split("\t")[1]
            file2.write("%s\t%s\n"%(term,gene))


    file1.close()
    file2.close()
trans()