最近发现差异基因用GO富集之后的terms自己并不感兴趣,所以找别的富集的办法,大家都知道kegg富集到的大部分都是疾病癌症相关的,所以GSEA无疑是一个基因功能注释不错的选择。
关于什么是GSEA有很多不错的贴子,这里不在赘述,此处主要讲一下我自己运行过程中出现的一些情况和流程
初步了解
首先要知道做GSEA需要什么,一个输入文件是一个数据框,第一列是基因名,第二列是logFC
或者是FC
,不是有负值的那个log2FC
,这里要注意。
最后输出的结果就是一个类似于go的一个表格,这个表中包含富集的terms和富集通路的上下调(是根据NES来看的,NES>0上调,NES<0下调),
同时要知道这个和go富集过程其实差不多,可以使用基因的名字去富集,也可以转换成ENTREZID
去富集 ,同时做GSEA的时候要注意需要一个库:OrgDb = org.Mm.eg.db
,这个库只有一些物种有,大部分其他的就没有,比如食蟹猴的没有这个库,但是有猕猴的,这个是可以自己做的,但是我还没搞清楚怎么做,所以先搁置,并且这个库安装也很烦,网不好的时候安不上,人的 org.Hs.eg.db
还尤其的大,下载起来很费劲,需要用的话温馨提示提前下载。
GSEA输入文件生成
# change_v1 GSEA #################################################################
> rm(list = ls())
> library("org.Mm.eg.db")
> mouse_skin_degs <- read.delim('/data1/jiarongf/specise_compare_HGPS/pubmed/Mus_musculus/GSE122865/fastq/fastp/hisat2/stringtie/R/gene_TPM_matrix_DEGs_human_genename.tab',
+ stringsAsFactors = F,header = T)
> head(mouse_skin_degs)
SYMBOL HGPS_Fibroblast_1 HGPS_Fibroblast_2 HGPS_Fibroblast_3 HGPS_Fibroblast_4 HGPS_Fibroblast_5
1 0610030E20Rik 0.151554 0.000000 0.000000 0.037521 0.00000
2 1110032A03Rik 4.906227 4.496196 5.462672 4.082028 0.00000
3 1110032F04Rik 0.030137 0.000000 0.000000 0.034170 0.00000
4 1110051M20Rik 0.000000 0.000000 0.000000 0.764685 0.00000
5 1110059G10Rik 0.155385 0.134198 0.322223 0.431767 0.00000
6 1700003E16Rik 22.523249 15.533301 16.745457 12.444068 13.03459
HGPS_Fibroblast_6 WT_ColonFibroblasts_1 WT_ColonFibroblasts_2 WT_ColonFibroblasts_3
1 0.000000 0.00000 0.000000 0.000000
2 4.314265 2.65162 3.458404 6.123207
3 0.000000 0.00000 0.000000 0.000000
4 0.113645 0.00000 0.000000 0.000000
5 0.000000 0.00000 0.000000 0.114819
6 14.842849 16.37361 13.121490 17.250549
WT_ColonFibroblasts_4 pvalue tstat meansControl meansCase FC log2FC p_bf
1 0.00000 0.2593859 1.27180264 0.00000000 0.03151250 Inf Inf 1
2 3.55835 0.9501329 -0.06460476 3.94789525 3.87689800 0.9820164 -0.02618093 1
3 0.00000 0.1757355 1.57649523 0.00000000 0.01071783 Inf Inf 1
4 0.00000 0.2944759 1.17069332 0.00000000 0.14638833 Inf Inf 1
5 0.00000 0.1028254 1.89809484 0.02870475 0.17392883 6.0592353 2.59913572 1
6 13.88644 0.7062050 0.39100501 15.15802125 15.85391967 1.0459096 0.06475814 1
change_V1 change_V2 change_V3 humanGene
1 NOT NOT NOT C2orf68
2 NOT NOT NOT C11orf1
3 NOT NOT NOT C3orf80
4 NOT NOT NOT C11orf49
5 NOT NOT NOT KIAA1143
6 NOT NOT NOT C2orf81
因为我的这个表达矩阵是小鼠的,所以我前期已经做了基因名的转换,最后一列是相对应的人的基因名
而change开头的那三列是我差异分析的up down not
前面的是TPM的表达量的值,其实输入文件用不了这么多列,接着往下
> inter_gene <- read.delim('/data1/jiarongf/specise_compare_HGPS/pubmed/compare/skin/inter_gene_change_v1.tab',header = F,stringsAsFactors = F)
> mouse_skin_degs_inter <- mouse_skin_degs[mouse_skin_degs$humanGene%in%inter_gene$V1,]
> head(inter_gene)
V1
1 ACADS
2 ACAT1
3 ACP2
4 ADAL
5 ADAM15
6 ADAM22
其实我只想要inter_gene的这几个基因做GSEA,因为上面说了需要用到这些基因的FC,所以需要做merge
names(mouse_skin_degs_inter)
[1] "SYMBOL" "HGPS_Fibroblast_1" "HGPS_Fibroblast_2" "HGPS_Fibroblast_3"
[5] "HGPS_Fibroblast_4" "HGPS_Fibroblast_5" "HGPS_Fibroblast_6" "WT_ColonFibroblasts_1"
[9] "WT_ColonFibroblasts_2" "WT_ColonFibroblasts_3" "WT_ColonFibroblasts_4" "pvalue"
[13] "tstat" "meansControl" "meansCase" "FC"
[17] "log2FC" "p_bf" "change_V1" "change_V2"
[21] "change_V3" "humanGene"
> mouse_skin_degs_inter <- mouse_skin_degs_inter[,c(1,16,22)]
> gene.df <- bitr(mouse_skin_degs_inter$SYMBOL, fromType="SYMBOL",
+ toType="ENTREZID",
+ OrgDb = "org.Mm.eg.db")
'select()' returned 1:1 mapping between keys and columns
> head(gene.df)
SYMBOL ENTREZID
1 Acads 11409
2 Acat1 110446
3 Acp2 11432
4 Adal 75894
5 Adam15 11490
6 Adam22 11496
> names(gene.df)
[1] "SYMBOL" "ENTREZID"
> data <- merge(gene.df,mouse_skin_degs_inter)
> data <- data[!data$FC=='Inf',]
> data = data[order(data$FC,decreasing = TRUE),]
> genelist <- data$FC
> names(genelist) <- data$ENTREZID
> head(genelist)
21816 243937 227720 212073 54170 22411
685.03023 141.14572 53.24192 49.39867 36.03380 32.10290
>
开始GSEA
Go_gseresult <- gseGO(genelist,
OrgDb = org.Mm.eg.db, #要注意不能加引号
keyType = "ENTREZID",
ont="ALL",
#nPerm = 1000,
minGSSize = 10,
maxGSSize = 500,
pvalueCutoff=1)#这里变成1是因为我用0.05做了结果报错了,我一直没解决,这样后面再筛选p值也是有结果的
Go_gseresult_data <- Go_gseresult@result
Go_gseresult_data <- Go_gseresult_data[Go_gseresult_data$pvalue<0.05,]#8
Go_gseresult_data$ONTOLOGY <- factor(Go_gseresult_data$ONTOLOGY,levels = c('BP','MF','CC'))
Go_gseresult_data <- Go_gseresult_data[order(Go_gseresult_data$ONTOLOGY),]
Go_gseresult_data$Description <- factor(Go_gseresult_data$Description,levels = rev(Go_gseresult_data$Description))
write.table(Go_gseresult_data,file='/data1/jiarongf/specise_compare_HGPS/pubmed/compare/skin/inter_gene_change_v1_GSEA.tab',
sep = '\t',quote = F,col.names = F,row.names = F)
names(Go_gseresult_data)
color <- RColorBrewer::brewer.pal(3,"Dark2")
q1<-ggplot(Go_gseresult_data, mapping=aes(x = setSize, y = Description,fill=ONTOLOGY))+
geom_bar(stat="identity",position=position_dodge(0.75))+theme_bw()+
scale_fill_manual(values =color)+
ggtitle('GSEA results')+ylab('')+
theme(axis.text.y =element_text(size=10,face = "bold", color="black"),
axis.text.x =element_text(size=10,face = "bold", color="black"),#,angle = 3
axis.title=element_text(size=10,face = "bold"),
title = element_text(size=10,face = "bold"),
legend.text=element_text(size=10),legend.title = element_text(size=10),
panel.border = element_rect(color = "black",size = 0.8,fill = NA),
axis.title.y = element_text())
q1
ggsave(filename = '/data1/jiarongf/specise_compare_HGPS/pubmed/compare/skin/inter_gene_change_v1_GSEA.pdf',
q1,
height = 8,width = 10)
最后生成的图就是这样啦
网友评论