美文网首页
【R>>树神系列msig】畅游MsigDB

【R>>树神系列msig】畅游MsigDB

作者: 高大石头 | 来源:发表于2021-07-04 18:40 被阅读0次

生物信息学分析过程中需要进行GSEA、GSVA、ssGSEA分析等,这就离不开MsigDB数据集。目前已更新至v7.4。


在实际应用过程中,往往需要频繁调用MsigDB数据,因此就有大神开发了各种R包来接入MsigDB,比如msigdbrmsig
下面我们就来学习下树神的R包msig,这是大神亲自写的思维导图:
树神的公众号:一棵树zj
来自公众号:一棵树zj

主要包含两种探索方式:网络探索和本地探索。

需要的内容是:geneset name, gene

网络探索分为注册搜索和非注册搜索,注册搜索时需要提供email。本地搜索时可以通过msig_update()更新数据库内容。

实用的函数:

  • msig_gene()

  • msig_geneSymbol()

1.R包安装

rm(list = ls())
#install.packages("msig")
library(msig)

2.核心函数

2.1 browse_msig

包含两个参数:geneSetNamecollection

browse_msig("immune","c2") #搜索c2中包含immue的数据集
##  [1] "galindo_immune_response_to_enterotoxin"                                      
##  [2] "goldrath_immune_memory"                                                      
##  [3] "jinesh_blebbishield_to_immune_cell_fusion_pbshms_dn"                         
##  [4] "jinesh_blebbishield_to_immune_cell_fusion_pbshms_up"                         
##  [5] "kegg_autoimmune_thyroid_disease"                                             
##  [6] "kegg_intestinal_immune_network_for_iga_production"                           
##  [7] "lin_tumor_escape_from_immune_attack"                                         
##  [8] "reactome_adaptive_immune_system"                                             
##  [9] "reactome_cytokine_signaling_in_immune_system"                                
## [10] "reactome_diseases_of_immune_system"                                          
## [11] "reactome_innate_immune_system"                                               
## [12] "reactome_modulation_by_mtb_of_host_immune_system"                            
## [13] "reactome_regulation_of_innate_immune_responses_to_cytosolic_dna"             
## [14] "reactome_runx3_regulates_immune_response_and_cell_migration"                 
## [15] "reactome_sting_mediated_induction_of_host_immune_responses"                  
## [16] "reactome_sumoylation_of_immune_response_proteins"                            
## [17] "wp_control_of_immune_tolerance_by_vasoactive_intestinal_peptide"             
## [18] "wp_interactions_between_immune_cells_and_micrornas_in_tumor_microenvironment"
## [19] "wp_mirnas_involvement_in_the_immune_response_in_sepsis"                      
## [20] "wp_pathways_of_nucleic_acid_metabolism_and_innate_immune_sensing"            
## [21] "wp_sarscov2_innate_immunity_evasion_and_cellspecific_immune_response"        
## [22] "wp_the_human_immune_response_to_tuberculosis"                                
## attr(,"browse_msig")
## [1] "immune"

2.2 browse_show_collection

展示MSigDB数据库中的所有collection

browse_show_collection()
##  [1] "H"                "C1"               "C1&chromosome=1"  "C1&chromosome=2" 
##  [5] "C1&chromosome=3"  "C1&chromosome=4"  "C1&chromosome=5"  "C1&chromosome=6" 
##  [9] "C1&chromosome=7"  "C1&chromosome=8"  "C1&chromosome=9"  "C1&chromosome=10"
## [13] "C1&chromosome=11" "C1&chromosome=12" "C1&chromosome=13" "C1&chromosome=14"
## [17] "C1&chromosome=15" "C1&chromosome=16" "C1&chromosome=17" "C1&chromosome=18"
## [21] "C1&chromosome=19" "C1&chromosome=20" "C1&chromosome=21" "C1&chromosome=22"
## [25] "C1&chromosome=x"  "C1&chromosome=y"  "C1&chromosome=mt" "C2"              
## [29] "CGP"              "CP"               "CP:BIOCARTA"      "CP:KEGG"         
## [33] "CP:PID"           "CP:REACTOME"      "CP:WIKIPATHWAYS"  "C3"              
## [37] "MIR"              "MIR:MIR_Legacy"   "MIR:MIRDB"        "TFT"             
## [41] "TFT:GTRD"         "TFT:TFT_Legacy"   "C4"               "CGN"             
## [45] "CM"               "C5"               "GO"               "GO:BP"           
## [49] "GO:CC"            "GO:MF"            "HPO"              "C6"              
## [53] "C7"               "IMMUNESIGDB"      "VAX"              "C8"

2.3 msig_filt

类似dplyr包里的filter函数

browse_msig("immune") %>% 
  msig_filt("response") %>% 
  head(10)
##  [1] "galindo_immune_response_to_enterotoxin"                                                                                        
##  [2] "gobp_activation_of_immune_response"                                                                                            
##  [3] "gobp_activation_of_innate_immune_response"                                                                                     
##  [4] "gobp_adaptive_immune_response"                                                                                                 
##  [5] "gobp_adaptive_immune_response_based_on_somatic_recombination_of_immune_receptors_built_from_immunoglobulin_superfamily_domains"
##  [6] "gobp_antifungal_innate_immune_response"                                                                                        
##  [7] "gobp_antimicrobial_humoral_immune_response_mediated_by_antimicrobial_peptide"                                                  
##  [8] "gobp_antiviral_innate_immune_response"                                                                                         
##  [9] "gobp_b_cell_activation_involved_in_immune_response"                                                                            
## [10] "gobp_b_cell_proliferation_involved_in_immune_response"

2.4 msig_gene系列

2.4.1 msig_gene

提取geneset里的基因信息

genes <- msig_gene("hallmark_peroxisome")
## HALLMARK_PEROXISOME  105 members mapped to 104 genes
genes %>% 
  msig_view()
OriginalMember NCBI(Entrez)GeneId GeneSymbol GeneDescription
hallmark_peroxisome
ABCB1 5243 ABCB1 ATP binding cassette subfamily B member …
ABCB4 5244 ABCB4 ATP binding cassette subfamily B member …
ABCB9 23457 ABCB9 ATP binding cassette subfamily B member …
ABCC5 10057 ABCC5 ATP binding cassette subfamily C member …
ABCC8 6833 ABCC8 ATP binding cassette subfamily C member …
ABCD1 215 ABCD1 ATP binding cassette subfamily D member …
ABCD2 225 ABCD2 ATP binding cassette subfamily D member …
ABCD3 5825 ABCD3 ATP binding cassette subfamily D member …

2.4.2 msig_geneSymbol

提取genset的GeneSymbol的列

genes <- msig_geneSymbol("hallmark_peroxisome")

2.5 related_geneset

related_geneset("hallmark_peroxisome")
## $`28 founder gene sets for this hallmark gene set`
##  [1] "chr11p"                                                                       
##  [2] "chr15q"                                                                       
##  [3] "gobp_bile_acid_metabolic_process"                                             
##  [4] "gobp_hormone_metabolic_process"                                               
##  [5] "gobp_peroxisome_organization"                                                 
##  [6] "gobp_response_to_drug"                                                        
##  [7] "gobp_steroid_biosynthetic_process"                                            
##  [8] "gobp_steroid_metabolic_process"                                               
##  [9] "gocc_microbody"                                                               
## [10] "gocc_microbody_membrane"                                                      
## [11] "gomf_nucleobase_containing_compound_transmembrane_transporter_activity"       
## [12] "gomf_protein_c_terminus_binding"                                              
## [13] "kegg_abc_transporters"                                                        
## [14] "kegg_peroxisome"                                                              
## [15] "kegg_primary_bile_acid_biosynthesis"                                          
## [16] "microbody_part"                                                               
## [17] "module_404"                                                                   
## [18] "peroxisomal_membrane"                                                         
## [19] "peroxisomal_part"                                                             
## [20] "peroxisome"                                                                   
## [21] "reactome_abc_family_proteins_mediated_transport"                              
## [22] "reactome_abc_transporters_in_lipid_homeostasis"                               
## [23] "reactome_alpha_linolenic_acid_ala_metabolism"                                 
## [24] "reactome_bile_acid_and_bile_salt_metabolism"                                  
## [25] "reactome_peroxisomal_lipid_metabolism"                                        
## [26] "reactome_synthesis_of_bile_acids_and_bile_salts"                              
## [27] "reactome_synthesis_of_bile_acids_and_bile_salts_via_24_hydroxycholesterol"    
## [28] "reactome_synthesis_of_bile_acids_and_bile_salts_via_7alpha_hydroxycholesterol"
## 
## attr(,"related_geneset")
## [1] "hallmark_peroxisome"

2.6 similarity_geneset

x <- similarity_geneset('REACTOME_DEGRADATION_OF_AXIN')
x
##    External_ID
## 1 R-HSA-169911
## 2 R-HSA-180585
## 3 R-HSA-211733
## 4  R-HSA-69601
## 5  R-HSA-69610
## 6  R-HSA-69613
## 7  R-HSA-75815
##                                                         External_Name
## 1                                             Regulation of Apoptosis
## 2                                Vif-mediated degradation of APOBEC3G
## 3 Regulation of activated PAK-2p34 by proteasome mediated degradation
## 4             Ubiquitin Mediated Degradation of Phosphorylated Cdc25A
## 5                                 p53-Independent DNA Damage Response
## 6                          p53-Independent G1/S DNA damage checkpoint
## 7                         Ubiquitin-dependent degradation of Cyclin D
##                                                   link
## 1 https://www.reactome.org/content/detail/R-HSA-169911
## 2 https://www.reactome.org/content/detail/R-HSA-180585
## 3 https://www.reactome.org/content/detail/R-HSA-211733
## 4  https://www.reactome.org/content/detail/R-HSA-69601
## 5  https://www.reactome.org/content/detail/R-HSA-69610
## 6  https://www.reactome.org/content/detail/R-HSA-69613
## 7  https://www.reactome.org/content/detail/R-HSA-75815

总体看起来msig和msigdbr功能比较类似,但是感觉msigdbr好像更容易记一些。msigdbr通篇就一个函数msigdbr(species = "Homo sapiens", category = "C2", subcategory = "KEGG")

library(msigdbr)
msigdbr(species = "Homo sapiens", category = "C2", subcategory = "KEGG") %>% 
  head(5)
## # A tibble: 5 x 15
##   gs_cat gs_subcat gs_name gene_symbol entrez_gene ensembl_gene human_gene_symb~
##   <chr>  <chr>     <chr>   <chr>             <int> <chr>        <chr>           
## 1 C2     CP:KEGG   KEGG_A~ ABCA1                19 ENSG0000016~ ABCA1           
## 2 C2     CP:KEGG   KEGG_A~ ABCA10            10349 ENSG0000015~ ABCA10          
## 3 C2     CP:KEGG   KEGG_A~ ABCA12            26154 ENSG0000014~ ABCA12          
## 4 C2     CP:KEGG   KEGG_A~ ABCA13           154664 ENSG0000017~ ABCA13          
## 5 C2     CP:KEGG   KEGG_A~ ABCA2                20 ENSG0000010~ ABCA2           
## # ... with 8 more variables: human_entrez_gene <int>, human_ensembl_gene <chr>,
## #   gs_id <chr>, gs_pmid <chr>, gs_geoid <chr>, gs_exact_source <chr>,
## #   gs_url <chr>, gs_description <chr>

备注:本内容仅供学习交流,禁止用于商业用途,如有侵权请联系删除!

参考链接:

msig:An R Package for Exploring Molecular Signatures Database

msigdbr: MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format

相关文章

网友评论

      本文标题:【R>>树神系列msig】畅游MsigDB

      本文链接:https://www.haomeiwen.com/subject/uqhvultx.html