clusterProfiler花式用法

作者: 麒麟991 | 来源:发表于2020-02-19 11:54 被阅读0次

    Chapter 3 Universal enrichment analysis

    clusterProfiler supports both hypergeometric test and gene set enrichment analyses of many ontology/pathway, but it’s still not enough for users may want to analyze their data with unsupported organisms, slim version of GO, novel functional annotation (e.g. GO via BlastGO or KEGG via KAAS), unsupported ontologies/pathways or customized annotations.

    clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.

    3.1 Input data

    For over representation analysis, all we need is a gene vector, that is a vector of gene IDs. These gene IDs can be obtained by differential expression analysis (e.g. with DESeq2 package).

    For gene set enrichment analysis, we need a ranked list of genes. DOSE provides an example dataset geneList which was derived from R package breastCancerMAINZ that contained 200 samples, including 29 samples in grade I, 136 samples in grade II and 35 samples in grade III. We computed the ratios of geometric means of grade III samples versus geometric means of grade I samples. Logarithm of these ratios (base 2) were stored in geneList dataset.

    The geneList contains three features:

    1. numeric vector: fold change or other type of numerical variable
    2. named vector: every number was named by the corresponding gene ID
    3. sorted vector: number should be sorted in decreasing order

    Suppose you are importing your own data from a csv file and the file contains two columns, one for gene ID (no duplicated allowed) and another one for fold change, you can prepare your own geneList via the following command:

    d <- read.csv(your_csv_file)
    ## assume that 1st column is ID
    ## 2nd column is fold change
    
    ## feature 1: numeric vector
    geneList <- d[,2]
    
    ## feature 2: named vector
    names(geneList) <- as.character(d[,1])
    
    ## feature 3: decreasing order
    geneList <- sort(geneList, decreasing = TRUE)
    

    We can load the sample data into R via:

    data(geneList, package="DOSE")
    head(geneList)
    
    ##     4312     8318    10874    55143    55388      991 
    ## 4.572613 4.514594 4.418218 4.144075 3.876258 3.677857
    

    Suppose we define fold change greater than 2 as DEGs:

    gene <- names(geneList)[abs(geneList) > 2]
    head(gene)
    
    ## [1] "4312"  "8318"  "10874" "55143" "55388" "991"
    

    3.2 WikiPathways analysis

    WikiPathways is a continuously updated pathway database curated by a community of researchers and pathway enthusiasts. WikiPathways produces monthly releases of gmt files for supported organisms at data.wikipathways.org. Download the appropriate gmt file and then generate TERM2GENE and TERM2NAME to use enricher and GSEA functions.

    library(magrittr)
    library(clusterProfiler)
    
    data(geneList, package="DOSE")
    gene <- names(geneList)[abs(geneList) > 2]
    
    wpgmtfile <- system.file("extdata/wikipathways-20180810-gmt-Homo_sapiens.gmt", package="clusterProfiler")
    wp2gene <- read.gmt(wpgmtfile)
    wp2gene <- wp2gene %>% tidyr::separate(ont, c("name","version","wpid","org"), "%")
    wpid2gene <- wp2gene %>% dplyr::select(wpid, gene) #TERM2GENE
    wpid2name <- wp2gene %>% dplyr::select(wpid, name) #TERM2NAME
    
    ewp <- enricher(gene, TERM2GENE = wpid2gene, TERM2NAME = wpid2name)
    head(ewp)
    
    ##            ID
    ## WP2446 WP2446
    ## WP2361 WP2361
    ## WP179   WP179
    ## WP3942 WP3942
    ## WP4240 WP4240
    ## WP2328 WP2328
    ##                                                                           Description
    ## WP2446                                                  Retinoblastoma (RB) in Cancer
    ## WP2361                                                       Gastric Cancer Network 1
    ## WP179                                                                      Cell Cycle
    ## WP3942                                                         PPAR signaling pathway
    ## WP4240 Regulation of sister chromatid separation at the metaphase-anaphase transition
    ## WP2328                                                            Allograft Rejection
    ##        GeneRatio  BgRatio       pvalue     p.adjust
    ## WP2446     11/95  88/6249 6.801697e-08 1.054263e-05
    ## WP2361      6/95  29/6249 3.772735e-06 2.923870e-04
    ## WP179      10/95 122/6249 1.384549e-05 7.153503e-04
    ## WP3942      7/95  67/6249 6.210513e-05 2.406574e-03
    ## WP4240      4/95  16/6249 7.931988e-05 2.458916e-03
    ## WP2328      7/95  90/6249 4.016758e-04 1.037663e-02
    ##              qvalue
    ## WP2446 9.450779e-06
    ## WP2361 2.621058e-04
    ## WP179  6.412648e-04
    ## WP3942 2.157336e-03
    ## WP4240 2.204258e-03
    ## WP2328 9.301967e-03
    ##                                                       geneID
    ## WP2446 8318/9133/7153/6241/890/983/81620/7272/1111/891/24137
    ## WP2361                       4605/7153/11065/22974/6286/6790
    ## WP179          8318/991/9133/890/983/7272/1111/891/4174/9232
    ## WP3942                    4312/9415/9370/5105/2167/3158/5346
    ## WP4240                                    991/1062/4085/9232
    ## WP2328                   10563/6373/4283/3002/10578/3117/730
    ##        Count
    ## WP2446    11
    ## WP2361     6
    ## WP179     10
    ## WP3942     7
    ## WP4240     4
    ## WP2328     7
    
    ewp2 <- GSEA(geneList, TERM2GENE = wpid2gene, TERM2NAME = wpid2name, verbose=FALSE)
    head(ewp2)
    
    ##            ID
    ## WP4172 WP4172
    ## WP3932 WP3932
    ## WP306   WP306
    ## WP236   WP236
    ## WP474   WP474
    ## WP2911 WP2911
    ##                                           Description
    ## WP4172                     PI3K-Akt Signaling Pathway
    ## WP3932 Focal Adhesion-PI3K-Akt-mTOR-signaling pathway
    ## WP306                                  Focal Adhesion
    ## WP236                                    Adipogenesis
    ## WP474                       Endochondral Ossification
    ## WP2911    miRNA targets in ECM and membrane receptors
    ##        setSize enrichmentScore       NES      pvalue
    ## WP4172     311      -0.3406070 -1.467847 0.001347709
    ## WP3932     281      -0.3903410 -1.664489 0.001383126
    ## WP306      188      -0.4230017 -1.731019 0.001449275
    ## WP236      125      -0.4387536 -1.694197 0.001526718
    ## WP474       59      -0.5111776 -1.751890 0.001572327
    ## WP2911      21      -0.6953475 -1.888706 0.001757469
    ##          p.adjust    qvalues rank
    ## WP4172 0.03001658 0.02487562 2133
    ## WP3932 0.03001658 0.02487562 1994
    ## WP306  0.03001658 0.02487562 2221
    ## WP236  0.03001658 0.02487562 2301
    ## WP474  0.03001658 0.02487562 2142
    ## WP2911 0.03001658 0.02487562 2629
    ##                          leading_edge
    ## WP4172 tags=24%, list=17%, signal=21%
    ## WP3932 tags=26%, list=16%, signal=22%
    ## WP306  tags=28%, list=18%, signal=23%
    ## WP236  tags=34%, list=18%, signal=28%
    ## WP474  tags=47%, list=17%, signal=40%
    ## WP2911 tags=76%, list=21%, signal=60%
    ##                                                                                                                                                                                                                                                                                                                                                                                        core_enrichment
    ## WP4172 5228/7424/7157/627/2252/7059/92579/5563/5295/6794/1288/7010/3910/3371/3082/1291/4602/3791/1027/90993/3441/3643/1129/2322/1975/7450/596/3685/1942/2149/1280/4804/3675/595/2261/7248/2246/4803/2259/3912/1902/1278/1277/2846/2057/1293/2247/55970/5618/7058/10161/56034/3693/4254/3480/4908/5159/1292/3908/2690/3909/8817/4915/3551/2791/63923/3913/3667/1287/3679/7060/3479/80310/1311/5105/1101
    ## WP3932           2252/7059/51719/8660/5563/5295/6794/1288/7010/3910/3371/3082/3791/1301/1027/90993/3643/1129/1975/7450/3685/2034/1942/2149/1280/4804/3675/2261/7248/2246/4803/2259/3912/1902/2308/1278/1277/81617/2846/2057/2247/1281/50509/1290/55970/5618/7058/10161/56034/3693/4254/6720/3480/5159/3991/1289/1292/3908/2690/3909/8817/3551/2791/63923/3913/3667/3679/7060/3479/80310/1311/1101/3169
    ## WP306                                                                                                                       10420/5595/5228/7424/1499/4636/83660/7059/5295/1288/23396/3910/3371/3082/394/3791/7450/596/3685/1280/3675/595/2318/3912/1793/1278/5753/1277/10398/55742/50509/1290/2317/7058/25759/56034/3693/3480/5159/857/1292/3908/3909/63923/3913/3679/7060/3479/10451/80310/1311/1101
    ## WP236                                                                                                                                                                                 9611/8321/5925/8609/1499/2908/4088/8660/3399/6778/8648/6258/2662/5914/6776/2034/196/8204/4692/4208/2308/5468/6695/4023/1592/7350/81029/3952/1675/5618/6720/3991/2487/3667/3572/3479/6424/9370/5105/652/5346/2625
    ## WP474                                                                                                                                                                                                                                                    9759/4921/51719/860/5745/1028/1280/2261/4256/7042/4208/4322/8100/7078/2247/11096/85477/3480/2690/8817/5167/2737/5744/2487/5327/1300/1473/3479
    ## WP2911                                                                                                                                                                                                                                                                                                                6383/3672/7057/3915/3910/1291/1278/1293/1281/50509/1290/7058/3693/1289/1292/3913
    

    You may want to convert the gene IDs to gene symbols, which can be done by setReadable function.

    library(org.Hs.eg.db)
    ewp <- setReadable(ewp, org.Hs.eg.db, keyType = "ENTREZID")
    ewp2 <- setReadable(ewp2, org.Hs.eg.db, keyType = "ENTREZID")
    head(ewp)
    
    ##            ID
    ## WP2446 WP2446
    ## WP2361 WP2361
    ## WP179   WP179
    ## WP3942 WP3942
    ## WP4240 WP4240
    ## WP2328 WP2328
    ##                                                                           Description
    ## WP2446                                                  Retinoblastoma (RB) in Cancer
    ## WP2361                                                       Gastric Cancer Network 1
    ## WP179                                                                      Cell Cycle
    ## WP3942                                                         PPAR signaling pathway
    ## WP4240 Regulation of sister chromatid separation at the metaphase-anaphase transition
    ## WP2328                                                            Allograft Rejection
    ##        GeneRatio  BgRatio       pvalue     p.adjust
    ## WP2446     11/95  88/6249 6.801697e-08 1.054263e-05
    ## WP2361      6/95  29/6249 3.772735e-06 2.923870e-04
    ## WP179      10/95 122/6249 1.384549e-05 7.153503e-04
    ## WP3942      7/95  67/6249 6.210513e-05 2.406574e-03
    ## WP4240      4/95  16/6249 7.931988e-05 2.458916e-03
    ## WP2328      7/95  90/6249 4.016758e-04 1.037663e-02
    ##              qvalue
    ## WP2446 9.450779e-06
    ## WP2361 2.621058e-04
    ## WP179  6.412648e-04
    ## WP3942 2.157336e-03
    ## WP4240 2.204258e-03
    ## WP2328 9.301967e-03
    ##                                                              geneID
    ## WP2446 CDC45/CCNB2/TOP2A/RRM2/CCNA2/CDK1/CDT1/TTK/CHEK1/CCNB1/KIF4A
    ## WP2361                           MYBL2/TOP2A/UBE2C/TPX2/S100P/AURKA
    ## WP179       CDC45/CDC20/CCNB2/CCNA2/CDK1/TTK/CHEK1/CCNB1/MCM5/PTTG1
    ## WP3942                    MMP1/FADS2/ADIPOQ/PCK1/FABP4/HMGCS2/PLIN1
    ## WP4240                                     CDC20/CENPE/MAD2L1/PTTG1
    ## WP2328                    CXCL13/CXCL11/CXCL9/GZMB/GNLY/HLA-DQA1/C7
    ##        Count
    ## WP2446    11
    ## WP2361     6
    ## WP179     10
    ## WP3942     7
    ## WP4240     4
    ## WP2328     7
    
    head(ewp2)
    
    ##            ID
    ## WP4172 WP4172
    ## WP3932 WP3932
    ## WP306   WP306
    ## WP236   WP236
    ## WP474   WP474
    ## WP2911 WP2911
    ##                                           Description
    ## WP4172                     PI3K-Akt Signaling Pathway
    ## WP3932 Focal Adhesion-PI3K-Akt-mTOR-signaling pathway
    ## WP306                                  Focal Adhesion
    ## WP236                                    Adipogenesis
    ## WP474                       Endochondral Ossification
    ## WP2911    miRNA targets in ECM and membrane receptors
    ##        setSize enrichmentScore       NES      pvalue
    ## WP4172     311      -0.3406070 -1.467847 0.001347709
    ## WP3932     281      -0.3903410 -1.664489 0.001383126
    ## WP306      188      -0.4230017 -1.731019 0.001449275
    ## WP236      125      -0.4387536 -1.694197 0.001526718
    ## WP474       59      -0.5111776 -1.751890 0.001572327
    ## WP2911      21      -0.6953475 -1.888706 0.001757469
    ##          p.adjust    qvalues rank
    ## WP4172 0.03001658 0.02487562 2133
    ## WP3932 0.03001658 0.02487562 1994
    ## WP306  0.03001658 0.02487562 2221
    ## WP236  0.03001658 0.02487562 2301
    ## WP474  0.03001658 0.02487562 2142
    ## WP2911 0.03001658 0.02487562 2629
    ##                          leading_edge
    ## WP4172 tags=24%, list=17%, signal=21%
    ## WP3932 tags=26%, list=16%, signal=22%
    ## WP306  tags=28%, list=18%, signal=23%
    ## WP236  tags=34%, list=18%, signal=28%
    ## WP474  tags=47%, list=17%, signal=40%
    ## WP2911 tags=76%, list=21%, signal=60%
    ##                                                                                                                                                                                                                                                                                                                                                                                                                                      core_enrichment
    ## WP4172 PGF/VEGFC/TP53/BDNF/FGF7/THBS3/G6PC3/PRKAA2/PIK3R1/STK11/COL4A6/TEK/LAMA4/TNC/HGF/COL6A1/MYB/KDR/CDKN1B/CREB3L1/IFNA4/INSR/CHRM2/FLT3/EIF4B/VWF/BCL2/ITGAV/EFNA1/F2R/COL2A1/NGFR/ITGA3/CCND1/FGFR3/TSC1/FGF1/NGF/FGF14/LAMB1/LPAR1/COL1A2/COL1A1/LPAR4/EPOR/COL6A3/FGF2/GNG12/PRLR/THBS2/LPAR6/PDGFC/ITGB5/KITLG/IGF1R/NTF3/PDGFRB/COL6A2/LAMA2/GHR/LAMA3/FGF18/NTRK2/IKBKB/GNG11/TNN/LAMB2/IRS1/COL4A5/ITGA7/THBS4/IGF1/PDGFD/COMP/PCK1/CHAD
    ## WP3932      FGF7/THBS3/CAB39/IRS2/PRKAA2/PIK3R1/STK11/COL4A6/TEK/LAMA4/TNC/HGF/KDR/COL11A1/CDKN1B/CREB3L1/INSR/CHRM2/EIF4B/VWF/ITGAV/EPAS1/EFNA1/F2R/COL2A1/NGFR/ITGA3/FGFR3/TSC1/FGF1/NGF/FGF14/LAMB1/LPAR1/FOXO1/COL1A2/COL1A1/CAB39L/LPAR4/EPOR/FGF2/COL3A1/COL5A3/COL5A2/GNG12/PRLR/THBS2/LPAR6/PDGFC/ITGB5/KITLG/SREBF1/IGF1R/PDGFRB/LIPE/COL5A1/COL6A2/LAMA2/GHR/LAMA3/FGF18/IKBKB/GNG11/TNN/LAMB2/IRS1/ITGA7/THBS4/IGF1/PDGFD/COMP/CHAD/FOXA1
    ## WP306                                                                                                                                   TESK2/MAPK3/PGF/VEGFC/CTNNB1/MYL5/TLN2/THBS3/PIK3R1/COL4A6/PIP5K1C/LAMA4/TNC/HGF/ARHGAP5/KDR/VWF/BCL2/ITGAV/COL2A1/ITGA3/CCND1/FLNC/LAMB1/DOCK1/COL1A2/PTK6/COL1A1/MYL9/PARVA/COL5A3/COL5A2/FLNB/THBS2/SHC2/PDGFC/ITGB5/IGF1R/PDGFRB/CAV1/COL6A2/LAMA2/LAMA3/TNN/LAMB2/ITGA7/THBS4/IGF1/VAV3/PDGFD/COMP/CHAD
    ## WP236                                                                                                                                                                                                        NCOR1/FZD1/RB1/KLF7/CTNNB1/NR3C1/SMAD3/IRS2/ID3/STAT6/NCOA1/RXRG/GDF10/RARA/STAT5A/EPAS1/AHR/NRIP1/NDN/MEF2C/FOXO1/PPARG/SPOCK1/LPL/CYP26A1/UCP1/WNT5B/LEP/CFD/PRLR/SREBF1/LIPE/FRZB/IRS1/IL6ST/IGF1/SFRP4/ADIPOQ/PCK1/BMP4/PLIN1/GATA3
    ## WP474                                                                                                                                                                                                                                                                              HDAC4/DDR2/CAB39/RUNX2/PTH1R/CDKN1C/COL2A1/FGFR3/MGP/TGFB2/MEF2C/MMP13/IFT88/TIMP3/FGF2/ADAMTS5/SCIN/IGF1R/GHR/FGF18/ENPP1/GLI3/PTHLH/FRZB/PLAT/COL10A1/CST5/IGF1
    ## WP2911                                                                                                                                                                                                                                                                                                                                        SDC2/ITGA1/THBS1/LAMC1/LAMA4/COL6A1/COL1A2/COL6A3/COL3A1/COL5A3/COL5A2/THBS2/ITGB5/COL5A1/COL6A2/LAMB2
    

    As an alternative to manually downloading gmt files, install the rWikiPathways package to gain scripting access to the latest gmt files using the downloadPathwayArchive function.

    3.3 Cell Marker

    cell_markers <- vroom::vroom('http://bio-bigdata.hrbmu.edu.cn/CellMarker/download/Human_cell_markers.txt') %>%
       tidyr::unite("cellMarker", tissueType, cancerType, cellName, sep=", ") %>% 
       dplyr::select(cellMarker, geneID) %>%
       dplyr::mutate(geneID = strsplit(geneID, ', '))
    cell_markers
    
    ## # A tibble: 2,868 x 2
    ##    cellMarker                                       geneID  
    ##    <chr>                                            <list>  
    ##  1 Kidney, Normal, Proximal tubular cell            <chr [1…
    ##  2 Liver, Normal, Ito cell (hepatic stellate cell)  <chr [1…
    ##  3 Endometrium, Normal, Trophoblast cell            <chr [1…
    ##  4 Germ, Normal, Primordial germ cell               <chr [1…
    ##  5 Corneal epithelium, Normal, Epithelial cell      <chr [1…
    ##  6 Placenta, Normal, Cytotrophoblast                <chr [1…
    ##  7 Periosteum, Normal, Periosteum-derived progenit… <chr [4…
    ##  8 Amniotic membrane, Normal, Amnion epithelial ce… <chr [2…
    ##  9 Primitive streak, Normal, Primitive streak cell  <chr [2…
    ## 10 Adipose tissue, Normal, Stromal vascular fracti… <chr [1…
    ## # … with 2,858 more rows
    
    y <- enricher(gene, TERM2GENE=cell_markers, minGSSize=1)
    DT::datatable(as.data.frame(y))
    

    <label style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">Show <select name="DataTables_Table_0_length" aria-controls="DataTables_Table_0" class="" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none; font-family: inherit; font-size: 16px; margin: 0px; text-transform: none;"><option value="10" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">10</option><option value="25" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">25</option><option value="50" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">50</option><option value="100" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">100</option></select> entries</label>

    <label style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none;">Search:<input type="search" class="" placeholder="" aria-controls="DataTables_Table_0" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; -webkit-tap-highlight-color: transparent; text-size-adjust: none; font-family: inherit; font-size: 16px; margin: 0px 0px 0px 0.5em; line-height: normal; -webkit-appearance: textfield;"></label>

    ID Description GeneRatio BgRatio pvalue p.adjust qvalue geneID Count
    Embryonic prefrontal cortex, Normal, Neural progenitor cell Embryonic prefrontal cortex, Normal, Neural progenitor cell Embryonic prefrontal cortex, Normal, Neural progenitor cell 33/153 166/11582 4.43518066168319e-30 6.34230834620697e-28 4.80866955950915e-28 991/9493/9833/9133/10403/7153/259266/6241/9787/11065/55355/55872/51203/83461/22974/10460/79019/55839/890/983/4085/5080/332/3832/9212/51659/9055/891/4174/9232/1602/2018/4137 33
    Large intestine, Normal, MKI67+ progenitor cell Large intestine, Normal, MKI67+ progenitor cell Large intestine, Normal, MKI67+ progenitor cell 20/153 111/11582 1.42606245514518e-17 1.0196346554288e-15 7.7307596252607e-16 991/9133/10403/7153/259266/6241/9787/11065/51203/22974/10460/890/983/332/9212/9055/891/9232/2167/51705 20
    Kidney, Normal, Neutrophil Kidney, Normal, Neutrophil Kidney, Normal, Neutrophil 6/153 42/11582 0.0000170645935795602 0.000813412293959037 0.000616720399542001 6280/6279/597/7850/6286/366 6
    Large intestine, Normal, DCLK1+ progenitor cell Large intestine, Normal, DCLK1+ progenitor cell Large intestine, Normal, DCLK1+ progenitor cell 7/153 99/11582 0.000329875455341899 0.0117930475284729 0.00894136102637251 1307/730/54829/10234/79901/57758/4969 7
    Kidney, Renal Cell Carcinoma, Neutrophil Kidney, Renal Cell Carcinoma, Neutrophil Kidney, Renal Cell Carcinoma, Neutrophil 6/153 77/11582 0.0005276542867749 0.0150909126017621 0.0114417666395399 6280/6279/597/7850/6286/366 6
    Fetal gonad, Normal, Endothelial cell Fetal gonad, Normal, Endothelial cell Fetal gonad, Normal, Endothelial cell 9/153 209/11582 0.00181179940206466 0.0431812190825411 0.0327395330548526 9493/23397/9787/10460/55839/332/7272/4521/9232 9

    Showing 1 to 6 of 6 entries

    Previous1Next

    3.4 MSigDb analysis

    Molecular Signatures Database contains 8 major collections:

    • H: hallmark gene sets
    • C1: positional gene sets
    • C2: curated gene sets
    • C3: motif gene sets
    • C4: computational gene sets
    • C5: GO gene sets
    • C6: oncogenic signatures
    • C7: immunologic signatures

    Users can download GMT files from Broad Institute and use read.gmt to parse the file to be used in enricher() and GSEA().

    There is an R package, msigdbr, that already packed the MSigDB gene sets in tidy data format that can be used directly with clusterProfiler.

    It supports several specices:

    library(msigdbr)
    msigdbr_show_species()
    
    ##  [1] "Bos taurus"               "Caenorhabditis elegans"  
    ##  [3] "Canis lupus familiaris"   "Danio rerio"             
    ##  [5] "Drosophila melanogaster"  "Gallus gallus"           
    ##  [7] "Homo sapiens"             "Mus musculus"            
    ##  [9] "Rattus norvegicus"        "Saccharomyces cerevisiae"
    ## [11] "Sus scrofa"
    

    We can retrieve all human gene sets:

    m_df <- msigdbr(species = "Homo sapiens")
    head(m_df, 2) %>% as.data.frame
    
    ##          gs_name  gs_id gs_cat gs_subcat human_gene_symbol
    ## 1 AAACCAC_MIR140 M12609     C3       MIR             ABCC4
    ## 2 AAACCAC_MIR140 M12609     C3       MIR          ABRAXAS2
    ##   species_name entrez_gene gene_symbol sources
    ## 1 Homo sapiens       10257       ABCC4    <NA>
    ## 2 Homo sapiens       23172    ABRAXAS2    <NA>
    

    Or specific collection. Here we use C6, oncogenic gene sets as an example:

    m_t2g <- msigdbr(species = "Homo sapiens", category = "C6") %>% 
      dplyr::select(gs_name, entrez_gene)
    head(m_t2g)
    
    ## # A tibble: 6 x 2
    ##   gs_name              entrez_gene
    ##   <chr>                      <int>
    ## 1 AKT_UP_MTOR_DN.V1_DN       25864
    ## 2 AKT_UP_MTOR_DN.V1_DN          95
    ## 3 AKT_UP_MTOR_DN.V1_DN      137872
    ## 4 AKT_UP_MTOR_DN.V1_DN         134
    ## 5 AKT_UP_MTOR_DN.V1_DN       55326
    ## 6 AKT_UP_MTOR_DN.V1_DN         216
    
    em <- enricher(gene, TERM2GENE=m_t2g)
    em2 <- GSEA(geneList, TERM2GENE = m_t2g)
    head(em)
    
    ##                                            ID
    ## RPS14_DN.V1_DN                 RPS14_DN.V1_DN
    ## GCNP_SHH_UP_LATE.V1_UP GCNP_SHH_UP_LATE.V1_UP
    ## VEGF_A_UP.V1_DN               VEGF_A_UP.V1_DN
    ## PRC2_EZH2_UP.V1_DN         PRC2_EZH2_UP.V1_DN
    ## CSR_LATE_UP.V1_UP           CSR_LATE_UP.V1_UP
    ## ATF2_S_UP.V1_UP               ATF2_S_UP.V1_UP
    ##                                   Description GeneRatio
    ## RPS14_DN.V1_DN                 RPS14_DN.V1_DN    22/185
    ## GCNP_SHH_UP_LATE.V1_UP GCNP_SHH_UP_LATE.V1_UP    16/185
    ## VEGF_A_UP.V1_DN               VEGF_A_UP.V1_DN    15/185
    ## PRC2_EZH2_UP.V1_DN         PRC2_EZH2_UP.V1_DN    14/185
    ## CSR_LATE_UP.V1_UP           CSR_LATE_UP.V1_UP    12/185
    ## ATF2_S_UP.V1_UP               ATF2_S_UP.V1_UP    12/185
    ##                          BgRatio       pvalue     p.adjust
    ## RPS14_DN.V1_DN         185/10962 4.855041e-13 8.205020e-11
    ## GCNP_SHH_UP_LATE.V1_UP 186/10962 9.357887e-08 7.907414e-06
    ## VEGF_A_UP.V1_DN        193/10962 8.887046e-07 5.006369e-05
    ## PRC2_EZH2_UP.V1_DN     190/10962 3.907976e-06 1.651120e-04
    ## CSR_LATE_UP.V1_UP      166/10962 2.368079e-05 8.004105e-04
    ## ATF2_S_UP.V1_UP        190/10962 8.879028e-05 2.143651e-03
    ##                              qvalue
    ## RPS14_DN.V1_DN         6.694846e-11
    ## GCNP_SHH_UP_LATE.V1_UP 6.452017e-06
    ## VEGF_A_UP.V1_DN        4.084923e-05
    ## PRC2_EZH2_UP.V1_DN     1.347223e-04
    ## CSR_LATE_UP.V1_UP      6.530911e-04
    ## ATF2_S_UP.V1_UP        1.749102e-03
    ##                                                                                                                                        geneID
    ## RPS14_DN.V1_DN         10874/55388/991/9493/1062/4605/9133/23397/79733/9787/55872/83461/54821/51659/9319/9055/10112/4174/5105/2532/7021/79901
    ## GCNP_SHH_UP_LATE.V1_UP                                      55388/7153/79733/6241/9787/51203/983/9212/1111/9319/9055/3833/6790/4174/3169/1580
    ## VEGF_A_UP.V1_DN                                                   8318/9493/1062/9133/10403/6241/9787/4085/332/3832/7272/891/23362/2167/10234
    ## PRC2_EZH2_UP.V1_DN                                               8318/55388/4605/23397/9787/55355/10460/81620/2146/7272/9212/11182/3887/24137
    ## CSR_LATE_UP.V1_UP                                                               55143/2305/4605/6241/11065/55872/983/332/2146/51659/9319/1580
    ## ATF2_S_UP.V1_UP                                                               4312/6280/2305/9493/7153/2146/10112/3887/8544/22885/10234/10974
    ##                        Count
    ## RPS14_DN.V1_DN            22
    ## GCNP_SHH_UP_LATE.V1_UP    16
    ## VEGF_A_UP.V1_DN           15
    ## PRC2_EZH2_UP.V1_DN        14
    ## CSR_LATE_UP.V1_UP         12
    ## ATF2_S_UP.V1_UP           12
    
    head(em2)
    
    ##                              ID     Description setSize
    ## LEF1_UP.V1_DN     LEF1_UP.V1_DN   LEF1_UP.V1_DN     186
    ## LTE2_UP.V1_UP     LTE2_UP.V1_UP   LTE2_UP.V1_UP     183
    ## RAF_UP.V1_DN       RAF_UP.V1_DN    RAF_UP.V1_DN     189
    ## IL2_UP.V1_DN       IL2_UP.V1_DN    IL2_UP.V1_DN     182
    ## ATF2_S_UP.V1_DN ATF2_S_UP.V1_DN ATF2_S_UP.V1_DN     181
    ## ATF2_UP.V1_DN     ATF2_UP.V1_DN   ATF2_UP.V1_DN     177
    ##                 enrichmentScore       NES      pvalue
    ## LEF1_UP.V1_DN        -0.4857056 -1.987140 0.001381215
    ## LTE2_UP.V1_UP        -0.3938357 -1.608416 0.001383126
    ## RAF_UP.V1_DN         -0.5521243 -2.254389 0.001383126
    ## IL2_UP.V1_DN         -0.4252756 -1.732758 0.001388889
    ## ATF2_S_UP.V1_DN      -0.4460160 -1.815767 0.001394700
    ## ATF2_UP.V1_DN        -0.5008413 -2.032113 0.001402525
    ##                   p.adjust     qvalues rank
    ## LEF1_UP.V1_DN   0.01354839 0.007394831 1705
    ## LTE2_UP.V1_UP   0.01354839 0.007394831 2110
    ## RAF_UP.V1_DN    0.01354839 0.007394831 2035
    ## IL2_UP.V1_DN    0.01354839 0.007394831 2774
    ## ATF2_S_UP.V1_DN 0.01354839 0.007394831 2531
    ## ATF2_UP.V1_DN   0.01354839 0.007394831 2055
    ##                                   leading_edge
    ## LEF1_UP.V1_DN   tags=35%, list=14%, signal=31%
    ## LTE2_UP.V1_UP   tags=33%, list=17%, signal=28%
    ## RAF_UP.V1_DN    tags=43%, list=16%, signal=37%
    ## IL2_UP.V1_DN    tags=39%, list=22%, signal=31%
    ## ATF2_S_UP.V1_DN tags=39%, list=20%, signal=32%
    ## ATF2_UP.V1_DN   tags=38%, list=16%, signal=32%
    ##                                                                                                                                                                                                                                                                                                                                                                                                                                                  core_enrichment
    ## LEF1_UP.V1_DN                                                                        79153/22998/10325/23221/79170/9187/4300/51479/25956/7552/8644/11162/7168/1490/54463/65084/25837/80303/1809/3397/4208/11223/79762/6451/481/54795/51626/8991/5950/5627/64699/6414/10103/221078/9961/5874/56898/90865/83989/5002/6653/56034/55314/85458/956/54502/4832/5783/1363/53832/54985/80221/214/50853/51149/57088/5157/2947/79932/7031/6505/9338/25924/80736/3169/10551
    ## LTE2_UP.V1_UP                                                                                                       23074/64759/1601/6558/7138/11138/10497/81031/665/2878/3554/4925/55005/9252/2034/5283/54869/5025/6019/65124/10979/51435/26999/4212/444/23588/55556/2954/93129/79762/54453/79789/316/79679/151011/51280/9630/57037/5627/9867/55198/6653/10455/5101/65055/8309/9429/23303/9590/253190/4982/4128/4680/1846/51760/80129/4857/5507/3158/5304/10974
    ## RAF_UP.V1_DN    23314/221037/1960/51340/55105/10265/22996/9687/5357/1952/12/4602/9231/596/51454/57419/595/4256/23022/8777/7837/7042/3397/57613/323/1831/6451/54843/7358/2353/8773/6938/8991/64699/57007/23389/10560/7227/3485/79068/10769/4254/26018/3480/2674/23327/857/116039/6542/2690/1955/1363/26353/7033/5376/89927/51363/8821/2239/6947/4886/214/3487/3667/5157/54847/54898/7031/6505/57535/10451/18/771/80129/5174/5507/56521/8839/8614/5241/10551/57758
    ## IL2_UP.V1_DN                                                  51276/10858/5333/2977/79874/10014/7773/64798/26959/6588/579/777/5158/23361/9915/5087/64759/2535/6123/7010/51285/126353/80201/79841/1028/2322/4222/947/9187/225/9976/5592/54996/3202/55779/23148/5334/8605/57326/26115/54453/55766/64927/6591/55204/9940/57608/653483/9886/10279/55805/10628/901/9891/7102/27244/79921/4674/8292/5125/79843/6857/79940/23541/3357/50853/9863/79932/90362/55663/4250
    ## ATF2_S_UP.V1_DN                                                               8714/53616/55667/51279/4642/7130/7139/2/710/80352/4608/11145/881/2252/3399/22846/10687/12/1397/1301/3554/2619/26011/78987/9284/1306/6586/9627/4054/54813/115207/55890/26249/4803/11075/3400/79789/8527/2353/10580/79987/80760/5493/90865/3485/3751/2202/2199/10610/1047/1294/23492/3908/5364/658/5350/2121/2331/4330/9737/4982/57088/8490/9358/6424/1307/3708/125/1311/56521/10351
    ## ATF2_UP.V1_DN                                                                             11145/9738/2252/8660/10687/5357/80201/12/22998/3554/2619/23556/8840/9284/57419/5737/6586/9627/8863/10085/115207/55890/23015/629/3212/3400/50650/8527/2353/79987/80760/9668/1012/10614/5493/8553/23126/90865/3485/3751/2202/57212/2199/10610/3075/6387/22834/1909/10129/57161/5350/2121/2331/4330/2952/4982/57088/9358/6424/6038/51313/1311/56521/10351/5304/4239/79901
    

    We can test with other collections, for example, using C3 to test whether the genes are up/down-regulated by sharing specific motif.

    m_t2g <- msigdbr(species = "Homo sapiens", category = "C3") %>% 
      dplyr::select(gs_name, entrez_gene)
    head(m_t2g)
    
    ## # A tibble: 6 x 2
    ##   gs_name        entrez_gene
    ##   <chr>                <int>
    ## 1 AAACCAC_MIR140       10257
    ## 2 AAACCAC_MIR140       23172
    ## 3 AAACCAC_MIR140          81
    ## 4 AAACCAC_MIR140          90
    ## 5 AAACCAC_MIR140        8754
    ## 6 AAACCAC_MIR140       11096
    
    em3 <- GSEA(geneList, TERM2GENE = m_t2g)
    head(em3)
    
    ##                                                            ID
    ## GGATTA_PITX2_Q2                               GGATTA_PITX2_Q2
    ## TTTGCAC_MIR19A_MIR19B                   TTTGCAC_MIR19A_MIR19B
    ## CAGTATT_MIR200B_MIR200C_MIR429 CAGTATT_MIR200B_MIR200C_MIR429
    ## AAAYRNCTG_UNKNOWN                           AAAYRNCTG_UNKNOWN
    ## TTGCACT_MIR130A_MIR301_MIR130B TTGCACT_MIR130A_MIR301_MIR130B
    ## ATGTACA_MIR493                                 ATGTACA_MIR493
    ##                                                   Description
    ## GGATTA_PITX2_Q2                               GGATTA_PITX2_Q2
    ## TTTGCAC_MIR19A_MIR19B                   TTTGCAC_MIR19A_MIR19B
    ## CAGTATT_MIR200B_MIR200C_MIR429 CAGTATT_MIR200B_MIR200C_MIR429
    ## AAAYRNCTG_UNKNOWN                           AAAYRNCTG_UNKNOWN
    ## TTGCACT_MIR130A_MIR301_MIR130B TTGCACT_MIR130A_MIR301_MIR130B
    ## ATGTACA_MIR493                                 ATGTACA_MIR493
    ##                                setSize enrichmentScore
    ## GGATTA_PITX2_Q2                    455      -0.3365909
    ## TTTGCAC_MIR19A_MIR19B              397      -0.3672122
    ## CAGTATT_MIR200B_MIR200C_MIR429     380      -0.3607152
    ## AAAYRNCTG_UNKNOWN                  290      -0.4355277
    ## TTGCACT_MIR130A_MIR301_MIR130B     313      -0.3890571
    ## ATGTACA_MIR493                     251      -0.3980657
    ##                                      NES      pvalue
    ## GGATTA_PITX2_Q2                -1.495575 0.001230012
    ## TTTGCAC_MIR19A_MIR19B          -1.612649 0.001254705
    ## CAGTATT_MIR200B_MIR200C_MIR429 -1.581860 0.001257862
    ## AAAYRNCTG_UNKNOWN              -1.863237 0.001319261
    ## TTGCACT_MIR130A_MIR301_MIR130B -1.675646 0.001322751
    ## ATGTACA_MIR493                 -1.676300 0.001347709
    ##                                  p.adjust    qvalues rank
    ## GGATTA_PITX2_Q2                0.04123119 0.02860412 2464
    ## TTTGCAC_MIR19A_MIR19B          0.04123119 0.02860412 4078
    ## CAGTATT_MIR200B_MIR200C_MIR429 0.04123119 0.02860412 3186
    ## AAAYRNCTG_UNKNOWN              0.04123119 0.02860412 2952
    ## TTGCACT_MIR130A_MIR301_MIR130B 0.04123119 0.02860412 2861
    ## ATGTACA_MIR493                 0.04123119 0.02860412 3911
    ##                                                  leading_edge
    ## GGATTA_PITX2_Q2                tags=26%, list=20%, signal=22%
    ## TTTGCAC_MIR19A_MIR19B          tags=44%, list=33%, signal=30%
    ## CAGTATT_MIR200B_MIR200C_MIR429 tags=34%, list=25%, signal=26%
    ## AAAYRNCTG_UNKNOWN              tags=38%, list=24%, signal=29%
    ## TTGCACT_MIR130A_MIR301_MIR130B tags=36%, list=23%, signal=28%
    ## ATGTACA_MIR493                 tags=45%, list=31%, signal=32%
    ##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       core_enrichment
    ## GGATTA_PITX2_Q2                                                                                                                                                                                                                                                                                                                               25836/311/3199/4604/57556/8019/3321/57835/8929/6671/1760/10401/4430/9497/5530/6532/54894/23528/5228/5087/2295/6453/56171/3232/4921/3005/221037/56912/10265/10521/10497/23767/8322/37/23429/6561/4303/57082/845/94/51447/9922/6258/23243/2571/10253/4602/29119/55068/126393/7402/56675/6302/5624/7716/7779/596/4300/57804/79923/1280/100506658/9627/3675/26037/2261/2804/65084/11030/3216/57326/140890/23255/5396/9854/1907/54843/57037/862/23452/55152/1848/8835/89795/57007/64221/5136/7227/10324/55273/6310/2202/56034/956/54502/51474/4254/6925/6097/744/443/23446/4306/4915/4487/25803/4081/4223/10472/388677/55107/347902/3708/10742/51313/4857/730/2532/4036/3169
    ## TTTGCAC_MIR19A_MIR19B          22911/1605/659/10318/4299/6249/81609/57194/9686/10425/9839/23013/22936/989/51742/4154/55625/51460/51592/11127/3976/54980/9751/56937/6045/4204/27252/4862/27236/2115/5869/152006/57698/57018/23032/56145/10150/55054/4090/25852/10618/2060/7107/10771/7071/9044/10904/9752/6431/8473/83452/392/10395/6777/51230/22889/23236/55652/158471/6595/905/56146/4026/79665/5932/115/8325/546/10154/9655/23499/54521/26959/23612/3321/56929/54778/26060/56848/9098/107/10420/23001/29922/9759/29116/23405/94134/2295/6453/83660/51719/8654/11138/2186/23341/8038/860/2004/9256/6548/23621/23503/29/394/220594/831/92689/55082/54014/1901/23111/7552/27303/1490/51454/57419/66008/2318/6252/7248/4929/23041/9900/55727/91694/22863/1389/54796/3400/9649/23253/54453/7048/22862/22905/23365/79899/5412/60481/54838/89795/5793/23389/7227/6310/3625/26018/23492/2099/23116/744/775/90355/388/23414/11069/6857/26960/55638/10472/2152/2697/3131/7786/55107/9863/23261/3479/3708/10742/2018/4036/2066
    ## CAGTATT_MIR200B_MIR200C_MIR429                                                                                                                                                                                                                                                    7071/31/23291/5179/57092/9969/9865/8473/9644/10395/6383/5569/1392/4008/9647/8613/6405/4884/8491/9706/2263/6567/53353/64766/5310/546/5128/55145/54469/200576/8434/7003/687/3915/1456/3321/10499/23189/26060/55578/1998/51421/22864/10484/10140/23001/4661/253782/9807/83660/95681/1960/2908/51719/2186/22846/108/9710/80205/2104/9522/7320/23243/23258/4602/129642/2669/51439/7716/54014/23047/5099/2823/3340/1942/7552/1843/100506658/8204/10979/79618/23015/8848/4692/5334/57515/26249/6563/23122/6196/1902/4077/2494/10370/3953/51339/79365/54838/89795/8076/5577/3751/6310/1009/29994/10769/1501/57509/3899/2273/23116/4908/27244/775/5783/65055/4035/23414/2737/5744/6857/55800/10186/3131/9828/6935/4982/219654/1602/3708/4857
    ## AAAYRNCTG_UNKNOWN                                                                                                                                                                                                                                                                                                                                                                                                        598/1392/4008/11080/79893/4724/8515/8613/81847/4297/3218/27314/7025/6595/8516/29966/11334/90/54329/89797/120/783/4750/8289/5071/3386/8929/2776/5530/23564/25959/4776/607/8038/5926/1288/2104/4303/845/51447/23243/22902/3082/54879/5745/4211/9024/23047/1942/79625/27303/4628/55911/51585/3202/6252/23338/64112/7482/444/55128/26249/64236/3397/64393/23635/23253/4013/1278/5252/323/1831/10370/1384/5549/26575/1272/89795/81603/55799/8470/7227/54875/761/477/29951/10468/6444/4254/7102/744/2045/4915/658/23414/11069/55800/10186/10472/2205/1287/7122/3479/1846/2018/2066/3169/8614/79901
    ## TTGCACT_MIR130A_MIR301_MIR130B                                                                                                                                                                                                                                                                                                                                                   9044/1808/196441/10904/27430/192670/8473/392/2909/83482/4762/4297/23236/27314/5932/4884/9706/90/546/9655/84196/5562/8626/757/26959/26060/56848/9098/23328/113791/107/10420/23001/29922/4661/23405/94134/2295/60468/55914/11138/2186/23341/6500/108/8038/51257/860/8648/80205/2004/9256/23243/23503/23469/4602/220594/8452/51111/55082/54014/1901/23111/7552/57419/389136/171586/3213/3202/6252/7248/23181/7482/55727/5797/91694/22863/10612/23253/8829/7048/753/23365/79365/79899/5468/5793/7227/6310/3625/26018/2099/23116/744/90355/23446/4306/23414/11069/64061/90627/26960/4223/55638/10472/2697/7786/23261/3708/10742/2018/2066
    ## ATGTACA_MIR493                                                                                         
    

    相关文章

      网友评论

        本文标题:clusterProfiler花式用法

        本文链接:https://www.haomeiwen.com/subject/dsffoctx.html