实验记录6:富集分析

作者: MC学公卫 | 来源:发表于2018-12-22 12:49 被阅读245次

    梗概

    1. GO富集分析相关知识
    2. 利用clusterProfiler进行,需要基因列表的Entrez.ID以及FC值(fold change倍数变化)
    3. 参考做法
      Statistical analysis and visualization of functional profiles for genes and gene clusters(含clusterProfiler包引用文献)
      或在R中执行命令:
    browseVignettes("clusterProfiler")
    

    GEO数据挖掘小尝试:(三)利用clusterProfiler进行富集分析


    什么是GO富集分析?

    GeneOntology富集分析是高通量数据分析的标配,不管是转录组、甲基化、ChIP-seq还是重测序,都会用到对一个或多个集合的基因进行功能富集分析。分析结果可以指示这个集合的基因具有什么样的功能偏好性,进而据此判断相应的生物学意义。
    GO富集分析是先筛选差异基因,再判断差异基因在哪些注释的通路存在富集。
    GO富集分析中的超几何分布:https://www.jianshu.com/p/13f46bebebd4


    进行富集分析

    1. 安装r包clusterProfiler

    在本地:

    BiocManager::install("clusterProfiler", version = "3.8")
    

    2.基因列表Entrez ID与FC值的准备

    利用一些差异表达的基因作为输入的基因,之前Seurat的FindAllMarkers有一批这些基因,但是没有EntrezID,需要进行一些处理。
    先查看一下:

    spleen.markers <- FindAllMarkers(object = spleen, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
    
    print(x= head(x=spleen.markers,n = 10))
                     p_val avg_logFC pct.1 pct.2     p_val_adj cluster     gene
    MS4A1    2.410289e-189  1.592266 0.974 0.258 3.773307e-185       0    MS4A1
    CD79A    1.245084e-177  1.485304 0.971 0.270 1.949180e-173       0    CD79A
    HLA-DRA  1.792014e-151  1.370242 1.000 0.556 2.805397e-147       0  HLA-DRA
    CD79B    2.280181e-149  1.265725 0.938 0.315 3.569624e-145       0    CD79B
    CD74     3.217361e-144  1.256090 0.997 0.871 5.036778e-140       0     CD74
    HLA-DPB1 4.238869e-137  1.154889 0.997 0.616 6.635949e-133       0 HLA-DPB1
    HLA-DRB5 6.911780e-136  1.123850 0.990 0.437 1.082039e-131       0 HLA-DRB5
    HLA-DQB1 4.004476e-134  1.121013 0.940 0.366 6.269007e-130       0 HLA-DQB1
    HLA-DRB1 2.026996e-133  1.115709 0.992 0.452 3.173262e-129       0 HLA-DRB1
    HLA-DPA1 3.492170e-130  1.099676 0.995 0.570 5.466992e-126       0 HLA-DPA1
    

    spleen.markers一共有3,108个基因marker,保存为csv文件
    (后来发现这一步是多余的,无需保存为文件,可直接提取基因列表)

    write.csv(spleen.markers, file = "/Users/shinianyike/Desktop/zll/data/spleen/spleen_markers.csv")
    

    获取基因列表

    genelist <- spleen.markers$gene
    head(genelist)
    
    [1] "MS4A1"    "CD79A"    "HLA-DRA"  "CD79B"    "CD74"     "HLA-DPB1"
    

    转化为Entrez ID

    library(clusterProfiler)
    s.EntrezID <- bitr(genelist, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    
    Loading required package: org.Hs.eg.db
    Error in eval(parse(text = OrgDb)) : 找不到对象'org.Hs.eg.db'
    

    需要下载人类基因组注释的数据org.Hs.eg.db
    安装与细节可在此网站查看Genome wide annotation for Human(含引用文献):

    install.packages("org.Hs.eg.db")
    
    Warning in install.packages :
      package ‘org.Hs.eg.db’ is not available (for R version 3.4.4)
    

    查看是否为R版本不够,当前版本为3.4.4。下为官方网站截图:



    只需要版本2.7及以上,R版本是足够的
    因此应该是安装命令的问题。



    因此R < 3.5.0的版本应该用其他命令进行包的安装:
    更换命令安装
    BiocInstaller::biocLite("org.Hs.eg.db")
    

    顺利完成。
    再运行一次之前的命令。

    s.EntrezID <- bitr(genelist, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    head(s.EntrezID)
    
        SYMBOL ENTREZID
    1    MS4A1      931
    2    CD79A      973
    3  HLA-DRA     3122
    4    CD79B      974
    5     CD74      972
    6 HLA-DPB1     3115
    

    查看行数:

    dim(s.EntrezID)
    
    [1] 2043    2
    

    原本是3,108个,现在剩下2043个基因,原因是去除了重复的个数。


    3. 富集分析

    接下来进行富集分析:

    s.ego <- enrichGO(gene = s.EntrezID.df$ENTREZID, OrgDb = 'org.Hs.eg.db', ont = 'ALL', pAdjustMethod = 'BH',pvalueCutoff = 0.05, qvalueCutoff = 0.2, keyType = 'ENTREZID')
    
    head(s.ego)
    

    筛选的p值为0.05,q值为0.2

               ONTOLOGY         ID                                       Description GeneRatio
    GO:0042119       BP GO:0042119                             neutrophil activation  211/1931
    GO:0002283       BP GO:0002283 neutrophil activation involved in immune response  208/1931
    GO:0043312       BP GO:0043312                          neutrophil degranulation  207/1931
    GO:0002446       BP GO:0002446                      neutrophil mediated immunity  208/1931
    GO:0006613       BP GO:0006613     cotranslational protein targeting to membrane   77/1931
    GO:0045047       BP GO:0045047                           protein targeting to ER   78/1931
                 BgRatio       pvalue     p.adjust       qvalue
    GO:0042119 496/17381 2.402001e-74 1.352326e-70 9.264138e-71
    GO:0002283 486/17381 7.667076e-74 2.158282e-70 1.478535e-70
    GO:0043312 485/17381 3.218981e-73 6.040954e-70 4.138367e-70
    GO:0002446 499/17381 2.372136e-71 3.338782e-68 2.287239e-68
    GO:0006613  96/17381 5.653664e-56 6.366026e-53 4.361058e-53
    GO:0045047 100/17381 5.659096e-55 5.310118e-52 3.637706e-52
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           geneID
    GO:0042119 HVCN1/EEF2/SELL/IMPDH2/PTPN6/CTSH/CCL5/B2M/HSPA8/PTPRC/RAB27A/STOM/SURF4/HSP90AA1/BIN2/HSPA1A/HSPA6/HSPA1B/ATP6V0A1/CTSZ/RNASET2/HSP90AB1/SERPINA1/FCN1/CST3/CD68/CFP/FGL2/S100A12/FPR1/LYZ/CD14/CD36/MNDA/CFD/FCGR2A/MGST1/TLR2/CD93/C5AR1/TYROBP/FCER1G/LGALS3/
    ………………
    ………………
               Count
    GO:0042119   211
    GO:0002283   208
    GO:0043312   207
    GO:0002446   208
    GO:0006613    77
    GO:0045047    78
    

    剩下1931个基因:

    GO:0042119 211/1931 neutrophil activation 中性粒细胞活化
    GO:0002283 208/1931 neutrophil activation involved in immune response 参与免疫应答的中性粒细胞活化
    GO:0043312 207/1931 neutrophil degranulation 中性粒细胞脱颗粒
    GO:0002446 208/1931 neutrophil mediated immunity 中性粒细胞介导的免疫效应
    GO:0006613 77/1931 cotranslational protein targeting to membrane 共翻译蛋白靶向膜
    GO:0045047 78/1931 protein targeting to ER 蛋白质靶向内质网

    讨论:有很多的中性粒细胞,而中性粒细胞一般在发生炎症的组织中才会出现。有点怀疑这个数据不是正常人的数据,而是疾病组织的数据。具体还要查看下HCA网站

    补充:后发现,确实这不是一个正常的数据,而是经过缺血处理的。

    dim(s.ego)
    [1] 1884   10
    > dim(s.ego[s.ego$ONTOLOGY=='BP',])
    [1] 1507   10
    > dim(s.ego[s.ego$ONTOLOGY=='CC',])
    [1] 215  10
    > dim(s.ego[s.ego$ONTOLOGY=='MF',])
    [1] 162  10
    

    看来大部分的基因富集在BP中

    4. 数据可视化

    s.barplot <- barplot(s.ego, showCategory = 20, drop = T)
    s.barplot
    
    spleen_GObarplot.jpeg
    s.barplot <- barplot(s.ego, showCategory = 25, drop = T)
    s.barplot
    
    spleen_GObarplot2.jpeg
    dotplot(s.ego)
    
    spleen_GOdotplot.jpeg
    enrichMap(s.ego)
    
    spleen_goMap.jpeg

    感觉有点崩

    2018-11-20

    一些想法:
    应该将每个cluster的基因都提取出来进行GO富集分析
    得到每个cluster的功能主题

    2018-11-23

    以下部分为将每个cluster的top10基因提取出来做富集分析,举例第一个cluster的基因提取步骤,其他cluster的步骤相同:

    spleen.mc0 <- filter(spleen.markers, cluster == 0)
    genelist_spleen_c0 <- spleen.mc0$gene
    
    head(spleen.mc0)
              p_val avg_logFC pct.1 pct.2     p_val_adj cluster     gene
    1 2.410289e-189  1.592266 0.974 0.258 3.773307e-185       0    MS4A1
    2 1.245084e-177  1.485304 0.971 0.270 1.949180e-173       0    CD79A
    3 1.792014e-151  1.370242 1.000 0.556 2.805397e-147       0  HLA-DRA
    4 2.280181e-149  1.265725 0.938 0.315 3.569624e-145       0    CD79B
    5 3.217361e-144  1.256090 0.997 0.871 5.036778e-140       0     CD74
    6 4.238869e-137  1.154889 0.997 0.616 6.635949e-133       0 HLA-DPB1
    
    dim(spleen.mc0)
    [1] 177   7
    

    其他cluster的操作与cluster0相同。

    genelist_spleen_c1 <- spleen.mc1$gene
    spleen.mc2 <- filter(spleen.markers, cluster == 2)
    genelist_spleen_c2 <- spleen.mc2$gene
    spleen.mc3 <- filter(spleen.markers, cluster == 3)
    genelist_spleen_c3 <- spleen.mc3$gene
    spleen.mc4 <- filter(spleen.markers, cluster == 4)
    genelist_spleen_c4 <- spleen.mc4$gene
    spleen.mc5 <- filter(spleen.markers, cluster == 5)
    genelist_spleen_c5 <- spleen.mc5$gene
    spleen.mc6 <- filter(spleen.markers, cluster == 6)
    genelist_spleen_c6 <- spleen.mc6$gene
    spleen.mc7 <- filter(spleen.markers, cluster == 7)
    genelist_spleen_c7 <- spleen.mc7$gene
    spleen.mc8 <- filter(spleen.markers, cluster == 8)
    genelist_spleen_c8 <- spleen.mc8$gene
    
    每个cluster所对应基因数目.png

    作柱形图

    barplot(sc0_ego_BP, showCategory=12,font.size=7,title="EnrichmentGO_BP")
    
    sc0_ego_BP <- enrichGO(gene = genelist_spleen_c0,OrgDb = org.Hg.eg.db,keytype = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.05)
    sc1_ego_BP <- enrichGO(gene = genelist_spleen_c1,OrgDb = org.Hg.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.05)
    sc2_ego_BP <- enrichGO(gene = genelist_spleen_c2,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.05)
    sc3_ego_BP <- enrichGO(gene = genelist_spleen_c3,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.05)
    sc4_ego_BP <- enrichGO(gene = genelist_spleen_c4,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.05)
    sc5_ego_BP <- enrichGO(gene = genelist_spleen_c5,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.05)
    sc6_ego_BP <- enrichGO(gene = genelist_spleen_c6,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.05)
    sc7_ego_BP <- enrichGO(gene = genelist_spleen_c7,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.05)
    sc8_ego_BP <- enrichGO(gene = genelist_spleen_c8,OrgDb = org.Hs.eg.db,keyType = "SYMBOL",ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.05)
    
    sc0.EntrezID <- bitr(genelist_spleen_c0, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    sc1.EntrezID <- bitr(genelist_spleen_c1, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    sc2.EntrezID <- bitr(genelist_spleen_c2, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    sc3.EntrezID <- bitr(genelist_spleen_c3, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    sc4.EntrezID <- bitr(genelist_spleen_c4, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    sc5.EntrezID <- bitr(genelist_spleen_c5, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    sc6.EntrezID <- bitr(genelist_spleen_c6, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    sc7.EntrezID <- bitr(genelist_spleen_c7, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    sc8.EntrezID <- bitr(genelist_spleen_c8, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    
    > sc0_ggo <- groupGO(gene = genelist_spleen_c0,OrgDb = 'org.Hs.eg.db',keytype = "SYMBOL", ont = "CC", level= 3)
    > head(sc0_ggo)
                       ID                    Description Count GeneRatio
    GO:0005886 GO:0005886                plasma membrane    60    60/177
    GO:0005628 GO:0005628              prospore membrane     0     0/177
    GO:0005789 GO:0005789 endoplasmic reticulum membrane    13    13/177
    GO:0019867 GO:0019867                 outer membrane     0     0/177
    GO:0031090 GO:0031090             organelle membrane    31    31/177
    GO:0034357 GO:0034357        photosynthetic membrane     0     0/177
                                                                                                                                                                                                                                                                                                                                                                                                         geneID
    GO:0005886 MS4A1/CD79A/HLA-DRA/CD79B/CD74/HLA-DPB1/HLA-DRB5/HLA-DQB1/HLA-DRB1/HLA-DPA1/HLA-DQA2/CD37/HLA-DQA1/IGHD/FCER2/HLA-DMB/CD22/HLA-DMA/CD52/HLA-DOB/ADAM28/CD19/CD24/LY86/RPSA/TNFRSF13C/CD72/MARCH1/LTB/BLK/BLNK/GNG7/RALGPS2/SWAP70/SYPL1/NCF1/LAPTM5/HVCN1/BTK/EEF2/TSPAN13/STX7/CD40/FCRL2/SELL/TNFRSF13B/FCRL1/DRAM2/PLPP5/P2RX5/TSPAN3/PTPN6/RGS19/CD82/DAPP1/LAT2/INPP5D/RASGRP2/MARCKSL1/PNN
    GO:0005628                                                                                                                                                                                                                                                                                                                                                                                                 
    GO:0005789                                                                                                                                                                                                                                                                                             HLA-DRA/CD74/HLA-DPB1/HLA-DRB5/HLA-DQB1/HLA-DRB1/HLA-DPA1/HLA-DQA2/HLA-DQA1/MARCH1/SMIM14/RIC3/SEC62
    GO:0019867                                                                                                                                                                                                                                                                                                                                                                                                 
    GO:0031090                                                                                                                                                                    HLA-DRA/CD74/HLA-DPB1/HLA-DRB5/HLA-DQB1/HLA-DRB1/HLA-DPA1/HLA-DQA2/HLA-DQA1/HLA-DMB/HLA-DMA/HLA-DOB/MARCH1/SYPL1/LAPTM5/HVCN1/SYNGR2/CYB561A3/STX7/SNX2/SELL/DRAM2/P2RX5/IMPDH2/RIC3/SLC25A6/GGA2/PLEKHF2/SMDT1/BLOC1S2/CHPT1
    GO:0034357 
    
    write.table(sc0_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/pictures/12-6/sc0_ego_BP.csv", sep=",", row.names = FALSE)
    
    sc0_ego_BP.png

    作富集图

    enrichMap(sc0_ego_BP)
    enrichMap(sc1_ego_BP)
    enrichMap(sc2_ego_BP)
    enrichMap(sc3_ego_BP)
    enrichMap(sc4_ego_BP)
    enrichMap(sc5_ego_BP)
    enrichMap(sc6_ego_BP)
    enrichMap(sc7_ego_BP)
    enrichMap(sc8_ego_BP)
    

    12月15日:

    library(dplyr)
    library(clusterProfiler)
    library(org.Hs.eg.db)
    keytypes(org.Hs.eg.db)
     [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "ENTREZID"    
     [7] "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"     "GO"           "GOALL"       
    [13] "IPI"          "MAP"          "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"        
    [19] "PFAM"         "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
    [25] "UNIGENE"      "UNIPROT" 
    
    spleen_genelist = bitr(spleen_genelist, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    
    head(spleen_genelist)
        SYMBOL ENTREZID
    1    MS4A1      931
    2    CD79A      973
    3  HLA-DRA     3122
    4    CD79B      974
    5     CD74      972
    6 HLA-DPB1     3115
    

    以上不保存,重做:

    library(dplyr)
    library(clusterProfiler)
    library(org.Hs.eg.db)
    keytypes(org.Hs.eg.db)
     [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "ENTREZID"    
     [7] "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"     "GO"           "GOALL"       
    [13] "IPI"          "MAP"          "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"        
    [19] "PFAM"         "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
    [25] "UNIGENE"      "UNIPROT" 
    
     head(spleen.markers)
                     p_val avg_logFC pct.1 pct.2     p_val_adj cluster     gene
    MS4A1    2.410289e-189  1.592266 0.974 0.258 3.773307e-185       0    MS4A1
    CD79A    1.245084e-177  1.485304 0.971 0.270 1.949180e-173       0    CD79A
    HLA-DRA  1.792014e-151  1.370242 1.000 0.556 2.805397e-147       0  HLA-DRA
    CD79B    2.280181e-149  1.265725 0.938 0.315 3.569624e-145       0    CD79B
    CD74     3.217361e-144  1.256090 0.997 0.871 5.036778e-140       0     CD74
    HLA-DPB1 4.238869e-137  1.154889 0.997 0.616 6.635949e-133       0 HLA-DPB1
    
    spleen_genelist = spleen.markers[,2]
    names(spleen_genelist) = as.character(spleen.markers[,7])
    spleen_genelist = sort(spleen_genelist, decreasing = TRUE)
    head(spleen_genelist)
       IGHG3   S100A8    IGHG1   S100A9    IGHA1    IGHG4 
    5.564483 5.244824 5.221544 5.142906 5.027861 4.971419 
    
    spleen_ENTREZ <- bitr(names(spleen_genelist), fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    head(spleen_ENTREZ)
      SYMBOL ENTREZID
    1  IGHG3     3502
    2 S100A8     6279
    3  IGHG1     3500
    4 S100A9     6280
    5  IGHA1     3493
    6  IGHG4     3503
    
    spleen_ggo <- groupGO(gene     = spleen_ENTREZ$ENTREZID,
                            OrgDb    = org.Hs.eg.db,
                            ont      = "BP",
                            level    = 3,
                            readable = TRUE)
    head(spleen_ggo)
    
                                            Description Count
    GO:0019953                      sexual reproduction    67
    GO:0019954                     asexual reproduction     0
    GO:0022414                     reproductive process   149
    GO:0032504      multicellular organism reproduction    70
    GO:0032505 reproduction of a single-celled organism     0
    GO:0061887         reproduction of symbiont in host     0
    
    spleen_ego <- enrichGO(gene=spleen_ENTREZ$ENTREZID,
                            OrgDb         = org.Hs.eg.db,
                            ont           = "BP",
                            pAdjustMethod = "BH",
                            pvalueCutoff  = 0.01,
                            qvalueCutoff  = 0.05,
                            readable      = TRUE)
    head(spleen_ego)
                       ID                                       Description GeneRatio
    GO:0042119 GO:0042119                             neutrophil activation  211/1931
    GO:0002283 GO:0002283 neutrophil activation involved in immune response  208/1931
    GO:0043312 GO:0043312                          neutrophil degranulation  207/1931
    GO:0002446 GO:0002446                      neutrophil mediated immunity  208/1931
    GO:0006613 GO:0006613     cotranslational protein targeting to membrane   77/1931
    GO:0045047 GO:0045047                           protein targeting to ER   78/1931
    

    KEGG分析:

    search_kegg_organism('hsa', by='kegg_code')
      kegg_code scientific_name common_name
    1       hsa    Homo sapiens       human
    human <- search_kegg_organism('Homo sapiens', by='scientific_name')
    dim(human)
    [1] 1 3
    
    kk <- enrichKEGG(gene         = spleen_ENTREZ$ENTREZID,
                      organism     = 'hsa',
                      pvalueCutoff = 0.05)
     head(kk)
                   ID                                 Description GeneRatio  BgRatio
    hsa03010 hsa03010                                    Ribosome   81/1163 153/7470
    hsa00190 hsa00190                   Oxidative phosphorylation   67/1163 133/7470
    hsa05012 hsa05012                           Parkinson disease   67/1163 142/7470
    hsa04141 hsa04141 Protein processing in endoplasmic reticulum   73/1163 165/7470
    hsa04145 hsa04145                                   Phagosome   67/1163 152/7470
    hsa05169 hsa05169                Epstein-Barr virus infection   80/1163 201/7470
    
    mkk <- enrichMKEGG(gene = spleen_ENTREZ$ENTREZID,organism = 'hsa')
    head(mkk)
                   ID                                 Description GeneRatio  BgRatio
    hsa03010 hsa03010                                    Ribosome   81/1163 153/7470
    hsa00190 hsa00190                   Oxidative phosphorylation   67/1163 133/7470
    hsa05012 hsa05012                           Parkinson disease   67/1163 142/7470
    hsa04141 hsa04141 Protein processing in endoplasmic reticulum   73/1163 165/7470
    hsa04145 hsa04145                                   Phagosome   67/1163 152/7470
    hsa05169 hsa05169                Epstein-Barr virus infection   80/1163 201/7470
    
    enrichMap(spleen_ego, n=15)
    cnetplot(spleen_ego, showCategory= 2, categorySize="pvalue", foldChange=spleen_genelist)
    

    同上述操作步骤,将脾脏的每一个cluster做KEGG分析:

    sc0_kk <- enrichKEGG(gene = sc0.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc1_kk <- enrichKEGG(gene = sc1.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc2_kk <- enrichKEGG(gene = sc2.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc3_kk <- enrichKEGG(gene = sc3.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc4_kk <- enrichKEGG(gene = sc4.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc5_kk <- enrichKEGG(gene = sc5.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc6_kk <- enrichKEGG(gene = sc6.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc7_kk <- enrichKEGG(gene = sc7.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc8_kk <- enrichKEGG(gene = sc8.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    
    sc0_mkk <- enrichMKEGG(gene = sc0.EntrezID$ENTREZID,organism = 'hsa')
    sc1_mkk <- enrichMKEGG(gene = sc1.EntrezID$ENTREZID,organism = 'hsa')
    sc2_mkk <- enrichMKEGG(gene = sc2.EntrezID$ENTREZID,organism = 'hsa')
    sc3_mkk <- enrichMKEGG(gene = sc3.EntrezID$ENTREZID,organism = 'hsa')
    sc4_mkk <- enrichMKEGG(gene = sc4.EntrezID$ENTREZID,organism = 'hsa')
    sc5_mkk <- enrichMKEGG(gene = sc5.EntrezID$ENTREZID,organism = 'hsa')
    sc6_mkk <- enrichMKEGG(gene = sc6.EntrezID$ENTREZID,organism = 'hsa')
    sc7_mkk <- enrichMKEGG(gene = sc7.EntrezID$ENTREZID,organism = 'hsa')
    sc8_mkk <- enrichMKEGG(gene = sc8.EntrezID$ENTREZID,organism = 'hsa')
    
    enrichMap(sc3_kk, n = 8)
    enrichMap(sc4_kk, n = 8)
    enrichMap(sc5_kk, n = 8)
    enrichMap(sc6_kk, n = 8)
    enrichMap(sc7_kk, n = 8)
    enrichMap(sc8_kk, n = 8)
    
    write.csv(sc0_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc0_ego_BP.csv")
    write.csv(sc1_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc1_ego_BP.csv")
    write.csv(sc2_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc2_ego_BP.csv")
    write.csv(sc3_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc3_ego_BP.csv")
    write.csv(sc4_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc4_ego_BP.csv")
    write.csv(sc5_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc5_ego_BP.csv")
    write.csv(sc6_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc6_ego_BP.csv")
    write.csv(sc7_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc7_ego_BP.csv")
    write.csv(sc8_ego_BP, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_22/sc8_ego_BP.csv")
    
    #导出KEGG的富集信息
    write.csv(sc0_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc0_kk.csv")
    write.csv(sc1_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc1_kk.csv")
    write.csv(sc2_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc2_kk.csv")
    write.csv(sc3_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc3_kk.csv")
    write.csv(sc4_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc4_kk.csv")
    write.csv(sc5_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc5_kk.csv")
    write.csv(sc6_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc6_kk.csv")
    write.csv(sc7_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc7_kk.csv")
    write.csv(sc8_kk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc8_kk.csv")
    
    #导出MKEGG的富集信息
    write.csv(sc0_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc0_mkk.csv")
    write.csv(sc1_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc1_mkk.csv")
    write.csv(sc2_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc2_mkk.csv")
    write.csv(sc3_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc3_mkk.csv")
    write.csv(sc4_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc4_mkk.csv")
    write.csv(sc5_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc5_mkk.csv")
    write.csv(sc6_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc6_mkk.csv")
    write.csv(sc7_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc7_mkk.csv")
    write.csv(sc8_mkk, file = "/Users/shinianyike/Desktop/zll/data/spleen/results/tables/2018_12_5/sc8_mkk.csv")
    
    

    12月25日
    因需要simplify进行GO精简,之前的不太规范,重新做。

    library(clusterProfiler)
    library(org.Hs.eg.db)
    head(spleen.markers)
                     p_val avg_logFC pct.1 pct.2     p_val_adj cluster     gene
    MS4A1    2.410289e-189  1.592266 0.974 0.258 3.773307e-185       0    MS4A1
    CD79A    1.245084e-177  1.485304 0.971 0.270 1.949180e-173       0    CD79A
    HLA-DRA  1.792014e-151  1.370242 1.000 0.556 2.805397e-147       0  HLA-DRA
    CD79B    2.280181e-149  1.265725 0.938 0.315 3.569624e-145       0    CD79B
    CD74     3.217361e-144  1.256090 0.997 0.871 5.036778e-140       0     CD74
    HLA-DPB1 4.238869e-137  1.154889 0.997 0.616 6.635949e-133       0 HLA-DPB1
    
    spleen_genelist = spleen.markers[,2]
    names(spleen_genelist) = as.character(spleen.markers[,7])
    spleen_genelist = sort(spleen_genelist, decreasing = TRUE)
    head(spleen_genelist)
       IGHG3   S100A8    IGHG1   S100A9    IGHA1    IGHG4 
    5.564483 5.244824 5.221544 5.142906 5.027861 4.971419 
    
    spleen_ENTREZ <- bitr(names(spleen_genelist), fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
    head(spleen_ENTREZ)
      SYMBOL ENTREZID
    1  IGHG3     3502
    2 S100A8     6279
    3  IGHG1     3500
    4 S100A9     6280
    5  IGHA1     3493
    6  IGHG4     3503
    
    spleen_ego <- enrichGO(gene=spleen_ENTREZ$ENTREZID,
                            OrgDb         = org.Hs.eg.db,
                            ont           = "BP",
                            pAdjustMethod = "BH",
                            pvalueCutoff  = 0.05,
                            qvalueCutoff  = 0.2,
                            readable      = TRUE)
    
    spleen.simp<- simplify(spleen_ego) # 去除重复GO
    
    dotplot(spleen.simp, showCategory=10, title="GO_biological process")
    
    
    

    每个cluster的也一样,去除重复GO:

    sc0_ego_BP_simp <- simplify(sc0_ego_BP)
    sc1_ego_BP_simp <- simplify(sc1_ego_BP)
    sc2_ego_BP_simp <- simplify(sc2_ego_BP)
    sc3_ego_BP_simp <- simplify(sc3_ego_BP)
    sc4_ego_BP_simp <- simplify(sc4_ego_BP)
    sc5_ego_BP_simp <- simplify(sc5_ego_BP)
    sc6_ego_BP_simp <- simplify(sc6_ego_BP)
    sc7_ego_BP_simp <- simplify(sc7_ego_BP)
    sc8_ego_BP_simp <- simplify(sc8_ego_BP)
    
    dotplot(sc0_ego_BP_simp, showCategory=10, title="GO biological process of cluster 0")
    dotplot(sc1_ego_BP_simp, showCategory=10, title="GO biological process of cluster 1")
    dotplot(sc2_ego_BP_simp, showCategory=10, title="GO biological process of cluster 2")
    dotplot(sc3_ego_BP_simp, showCategory=10, title="GO biological process of cluster 3")
    dotplot(sc4_ego_BP_simp, showCategory=10, title="GO biological process of cluster 4")
    dotplot(sc5_ego_BP_simp, showCategory=10, title="GO biological process of cluster 5")
    dotplot(sc6_ego_BP_simp, showCategory=10, title="GO biological process of cluster 6")
    dotplot(sc7_ego_BP_simp, showCategory=10, title="GO biological process of cluster 7")
    dotplot(sc8_ego_BP_simp, showCategory=10, title="GO biological process of cluster 8")
    
    spleen.simp_ego.jpeg ego_sc0_simp.jpeg ego_sc1_simp.jpeg ego_sc2_simp.jpeg ego_sc3_simp.jpeg ego_sc4_simp.jpeg ego_sc5_simp.jpeg ego_sc6_simp.jpeg ego_sc7_simp.jpeg ego_sc8_simp.jpeg

    KEGG

    sc0_kk <- enrichKEGG(gene = sc0.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc1_kk <- enrichKEGG(gene = sc1.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc2_kk <- enrichKEGG(gene = sc2.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc3_kk <- enrichKEGG(gene = sc3.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc4_kk <- enrichKEGG(gene = sc4.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc5_kk <- enrichKEGG(gene = sc5.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc6_kk <- enrichKEGG(gene = sc6.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc7_kk <- enrichKEGG(gene = sc7.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    sc8_kk <- enrichKEGG(gene = sc8.EntrezID$ENTREZID, organism = 'hsa', pvalueCutoff = 0.05)
    
    
    
    dotplot(sc0_kk, showCategory=10, title="KEGG enrichment of cluster 0")
    dotplot(sc1_kk, showCategory=10, title="KEGG enrichment of cluster 1")
    dotplot(sc2_kk, showCategory=10, title="KEGG enrichment of cluster 2")
    dotplot(sc3_kk, showCategory=10, title="KEGG enrichment of cluster 3")
    dotplot(sc4_kk, showCategory=10, title="KEGG enrichment of cluster 4")
    dotplot(sc5_kk, showCategory=10, title="KEGG enrichment of cluster 5")
    dotplot(sc6_kk, showCategory=10, title="KEGG enrichment of cluster 6")
    dotplot(sc7_kk, showCategory=10, title="KEGG enrichment of cluster 7")
    dotplot(sc8_kk, showCategory=10, title="KEGG enrichment of cluster 8")
    
    
    

    相关文章

      网友评论

        本文标题:实验记录6:富集分析

        本文链接:https://www.haomeiwen.com/subject/xhbpxqtx.html