美文网首页
【R>>树神系列msig】畅游MsigDB

【R>>树神系列msig】畅游MsigDB

作者: 高大石头 | 来源:发表于2021-07-04 18:40 被阅读0次

    生物信息学分析过程中需要进行GSEA、GSVA、ssGSEA分析等,这就离不开MsigDB数据集。目前已更新至v7.4。


    在实际应用过程中,往往需要频繁调用MsigDB数据,因此就有大神开发了各种R包来接入MsigDB,比如msigdbrmsig
    下面我们就来学习下树神的R包msig,这是大神亲自写的思维导图:
    树神的公众号:一棵树zj
    来自公众号:一棵树zj

    主要包含两种探索方式:网络探索和本地探索。

    需要的内容是:geneset name, gene

    网络探索分为注册搜索和非注册搜索,注册搜索时需要提供email。本地搜索时可以通过msig_update()更新数据库内容。

    实用的函数:

    • msig_gene()

    • msig_geneSymbol()

    1.R包安装

    rm(list = ls())
    #install.packages("msig")
    library(msig)
    

    2.核心函数

    2.1 browse_msig

    包含两个参数:geneSetNamecollection

    browse_msig("immune","c2") #搜索c2中包含immue的数据集
    
    ##  [1] "galindo_immune_response_to_enterotoxin"                                      
    ##  [2] "goldrath_immune_memory"                                                      
    ##  [3] "jinesh_blebbishield_to_immune_cell_fusion_pbshms_dn"                         
    ##  [4] "jinesh_blebbishield_to_immune_cell_fusion_pbshms_up"                         
    ##  [5] "kegg_autoimmune_thyroid_disease"                                             
    ##  [6] "kegg_intestinal_immune_network_for_iga_production"                           
    ##  [7] "lin_tumor_escape_from_immune_attack"                                         
    ##  [8] "reactome_adaptive_immune_system"                                             
    ##  [9] "reactome_cytokine_signaling_in_immune_system"                                
    ## [10] "reactome_diseases_of_immune_system"                                          
    ## [11] "reactome_innate_immune_system"                                               
    ## [12] "reactome_modulation_by_mtb_of_host_immune_system"                            
    ## [13] "reactome_regulation_of_innate_immune_responses_to_cytosolic_dna"             
    ## [14] "reactome_runx3_regulates_immune_response_and_cell_migration"                 
    ## [15] "reactome_sting_mediated_induction_of_host_immune_responses"                  
    ## [16] "reactome_sumoylation_of_immune_response_proteins"                            
    ## [17] "wp_control_of_immune_tolerance_by_vasoactive_intestinal_peptide"             
    ## [18] "wp_interactions_between_immune_cells_and_micrornas_in_tumor_microenvironment"
    ## [19] "wp_mirnas_involvement_in_the_immune_response_in_sepsis"                      
    ## [20] "wp_pathways_of_nucleic_acid_metabolism_and_innate_immune_sensing"            
    ## [21] "wp_sarscov2_innate_immunity_evasion_and_cellspecific_immune_response"        
    ## [22] "wp_the_human_immune_response_to_tuberculosis"                                
    ## attr(,"browse_msig")
    ## [1] "immune"
    

    2.2 browse_show_collection

    展示MSigDB数据库中的所有collection

    browse_show_collection()
    
    ##  [1] "H"                "C1"               "C1&chromosome=1"  "C1&chromosome=2" 
    ##  [5] "C1&chromosome=3"  "C1&chromosome=4"  "C1&chromosome=5"  "C1&chromosome=6" 
    ##  [9] "C1&chromosome=7"  "C1&chromosome=8"  "C1&chromosome=9"  "C1&chromosome=10"
    ## [13] "C1&chromosome=11" "C1&chromosome=12" "C1&chromosome=13" "C1&chromosome=14"
    ## [17] "C1&chromosome=15" "C1&chromosome=16" "C1&chromosome=17" "C1&chromosome=18"
    ## [21] "C1&chromosome=19" "C1&chromosome=20" "C1&chromosome=21" "C1&chromosome=22"
    ## [25] "C1&chromosome=x"  "C1&chromosome=y"  "C1&chromosome=mt" "C2"              
    ## [29] "CGP"              "CP"               "CP:BIOCARTA"      "CP:KEGG"         
    ## [33] "CP:PID"           "CP:REACTOME"      "CP:WIKIPATHWAYS"  "C3"              
    ## [37] "MIR"              "MIR:MIR_Legacy"   "MIR:MIRDB"        "TFT"             
    ## [41] "TFT:GTRD"         "TFT:TFT_Legacy"   "C4"               "CGN"             
    ## [45] "CM"               "C5"               "GO"               "GO:BP"           
    ## [49] "GO:CC"            "GO:MF"            "HPO"              "C6"              
    ## [53] "C7"               "IMMUNESIGDB"      "VAX"              "C8"
    

    2.3 msig_filt

    类似dplyr包里的filter函数

    browse_msig("immune") %>% 
      msig_filt("response") %>% 
      head(10)
    
    ##  [1] "galindo_immune_response_to_enterotoxin"                                                                                        
    ##  [2] "gobp_activation_of_immune_response"                                                                                            
    ##  [3] "gobp_activation_of_innate_immune_response"                                                                                     
    ##  [4] "gobp_adaptive_immune_response"                                                                                                 
    ##  [5] "gobp_adaptive_immune_response_based_on_somatic_recombination_of_immune_receptors_built_from_immunoglobulin_superfamily_domains"
    ##  [6] "gobp_antifungal_innate_immune_response"                                                                                        
    ##  [7] "gobp_antimicrobial_humoral_immune_response_mediated_by_antimicrobial_peptide"                                                  
    ##  [8] "gobp_antiviral_innate_immune_response"                                                                                         
    ##  [9] "gobp_b_cell_activation_involved_in_immune_response"                                                                            
    ## [10] "gobp_b_cell_proliferation_involved_in_immune_response"
    

    2.4 msig_gene系列

    2.4.1 msig_gene

    提取geneset里的基因信息

    genes <- msig_gene("hallmark_peroxisome")
    
    ## HALLMARK_PEROXISOME  105 members mapped to 104 genes
    
    genes %>% 
      msig_view()
    
    OriginalMember NCBI(Entrez)GeneId GeneSymbol GeneDescription
    hallmark_peroxisome
    ABCB1 5243 ABCB1 ATP binding cassette subfamily B member …
    ABCB4 5244 ABCB4 ATP binding cassette subfamily B member …
    ABCB9 23457 ABCB9 ATP binding cassette subfamily B member …
    ABCC5 10057 ABCC5 ATP binding cassette subfamily C member …
    ABCC8 6833 ABCC8 ATP binding cassette subfamily C member …
    ABCD1 215 ABCD1 ATP binding cassette subfamily D member …
    ABCD2 225 ABCD2 ATP binding cassette subfamily D member …
    ABCD3 5825 ABCD3 ATP binding cassette subfamily D member …

    2.4.2 msig_geneSymbol

    提取genset的GeneSymbol的列

    genes <- msig_geneSymbol("hallmark_peroxisome")
    

    2.5 related_geneset

    related_geneset("hallmark_peroxisome")
    
    ## $`28 founder gene sets for this hallmark gene set`
    ##  [1] "chr11p"                                                                       
    ##  [2] "chr15q"                                                                       
    ##  [3] "gobp_bile_acid_metabolic_process"                                             
    ##  [4] "gobp_hormone_metabolic_process"                                               
    ##  [5] "gobp_peroxisome_organization"                                                 
    ##  [6] "gobp_response_to_drug"                                                        
    ##  [7] "gobp_steroid_biosynthetic_process"                                            
    ##  [8] "gobp_steroid_metabolic_process"                                               
    ##  [9] "gocc_microbody"                                                               
    ## [10] "gocc_microbody_membrane"                                                      
    ## [11] "gomf_nucleobase_containing_compound_transmembrane_transporter_activity"       
    ## [12] "gomf_protein_c_terminus_binding"                                              
    ## [13] "kegg_abc_transporters"                                                        
    ## [14] "kegg_peroxisome"                                                              
    ## [15] "kegg_primary_bile_acid_biosynthesis"                                          
    ## [16] "microbody_part"                                                               
    ## [17] "module_404"                                                                   
    ## [18] "peroxisomal_membrane"                                                         
    ## [19] "peroxisomal_part"                                                             
    ## [20] "peroxisome"                                                                   
    ## [21] "reactome_abc_family_proteins_mediated_transport"                              
    ## [22] "reactome_abc_transporters_in_lipid_homeostasis"                               
    ## [23] "reactome_alpha_linolenic_acid_ala_metabolism"                                 
    ## [24] "reactome_bile_acid_and_bile_salt_metabolism"                                  
    ## [25] "reactome_peroxisomal_lipid_metabolism"                                        
    ## [26] "reactome_synthesis_of_bile_acids_and_bile_salts"                              
    ## [27] "reactome_synthesis_of_bile_acids_and_bile_salts_via_24_hydroxycholesterol"    
    ## [28] "reactome_synthesis_of_bile_acids_and_bile_salts_via_7alpha_hydroxycholesterol"
    ## 
    ## attr(,"related_geneset")
    ## [1] "hallmark_peroxisome"
    

    2.6 similarity_geneset

    x <- similarity_geneset('REACTOME_DEGRADATION_OF_AXIN')
    x
    
    ##    External_ID
    ## 1 R-HSA-169911
    ## 2 R-HSA-180585
    ## 3 R-HSA-211733
    ## 4  R-HSA-69601
    ## 5  R-HSA-69610
    ## 6  R-HSA-69613
    ## 7  R-HSA-75815
    ##                                                         External_Name
    ## 1                                             Regulation of Apoptosis
    ## 2                                Vif-mediated degradation of APOBEC3G
    ## 3 Regulation of activated PAK-2p34 by proteasome mediated degradation
    ## 4             Ubiquitin Mediated Degradation of Phosphorylated Cdc25A
    ## 5                                 p53-Independent DNA Damage Response
    ## 6                          p53-Independent G1/S DNA damage checkpoint
    ## 7                         Ubiquitin-dependent degradation of Cyclin D
    ##                                                   link
    ## 1 https://www.reactome.org/content/detail/R-HSA-169911
    ## 2 https://www.reactome.org/content/detail/R-HSA-180585
    ## 3 https://www.reactome.org/content/detail/R-HSA-211733
    ## 4  https://www.reactome.org/content/detail/R-HSA-69601
    ## 5  https://www.reactome.org/content/detail/R-HSA-69610
    ## 6  https://www.reactome.org/content/detail/R-HSA-69613
    ## 7  https://www.reactome.org/content/detail/R-HSA-75815
    

    总体看起来msig和msigdbr功能比较类似,但是感觉msigdbr好像更容易记一些。msigdbr通篇就一个函数msigdbr(species = "Homo sapiens", category = "C2", subcategory = "KEGG")

    library(msigdbr)
    msigdbr(species = "Homo sapiens", category = "C2", subcategory = "KEGG") %>% 
      head(5)
    
    ## # A tibble: 5 x 15
    ##   gs_cat gs_subcat gs_name gene_symbol entrez_gene ensembl_gene human_gene_symb~
    ##   <chr>  <chr>     <chr>   <chr>             <int> <chr>        <chr>           
    ## 1 C2     CP:KEGG   KEGG_A~ ABCA1                19 ENSG0000016~ ABCA1           
    ## 2 C2     CP:KEGG   KEGG_A~ ABCA10            10349 ENSG0000015~ ABCA10          
    ## 3 C2     CP:KEGG   KEGG_A~ ABCA12            26154 ENSG0000014~ ABCA12          
    ## 4 C2     CP:KEGG   KEGG_A~ ABCA13           154664 ENSG0000017~ ABCA13          
    ## 5 C2     CP:KEGG   KEGG_A~ ABCA2                20 ENSG0000010~ ABCA2           
    ## # ... with 8 more variables: human_entrez_gene <int>, human_ensembl_gene <chr>,
    ## #   gs_id <chr>, gs_pmid <chr>, gs_geoid <chr>, gs_exact_source <chr>,
    ## #   gs_url <chr>, gs_description <chr>
    

    备注:本内容仅供学习交流,禁止用于商业用途,如有侵权请联系删除!

    参考链接:

    msig:An R Package for Exploring Molecular Signatures Database

    msigdbr: MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format

    相关文章

      网友评论

          本文标题:【R>>树神系列msig】畅游MsigDB

          本文链接:https://www.haomeiwen.com/subject/uqhvultx.html