生物信息学分析过程中需要进行GSEA、GSVA、ssGSEA分析等,这就离不开MsigDB数据集。目前已更新至v7.4。
![](https://img.haomeiwen.com/i17982813/66a386e78e71edae.png)
在实际应用过程中,往往需要频繁调用MsigDB数据,因此就有大神开发了各种R包来接入MsigDB,比如
msigdbr
和msig
。下面我们就来学习下树神的R包
msig
,这是大神亲自写的思维导图:树神的公众号:一棵树zj
![](https://img.haomeiwen.com/i17982813/df8cd1fd6c588a27.png)
主要包含两种探索方式:网络探索和本地探索。
需要的内容是:geneset name, gene
网络探索分为注册搜索和非注册搜索,注册搜索时需要提供email。本地搜索时可以通过msig_update()
更新数据库内容。
实用的函数:
-
msig_gene()
-
msig_geneSymbol()
1.R包安装
rm(list = ls())
#install.packages("msig")
library(msig)
2.核心函数
2.1 browse_msig
包含两个参数:geneSetName
,collection
browse_msig("immune","c2") #搜索c2中包含immue的数据集
## [1] "galindo_immune_response_to_enterotoxin"
## [2] "goldrath_immune_memory"
## [3] "jinesh_blebbishield_to_immune_cell_fusion_pbshms_dn"
## [4] "jinesh_blebbishield_to_immune_cell_fusion_pbshms_up"
## [5] "kegg_autoimmune_thyroid_disease"
## [6] "kegg_intestinal_immune_network_for_iga_production"
## [7] "lin_tumor_escape_from_immune_attack"
## [8] "reactome_adaptive_immune_system"
## [9] "reactome_cytokine_signaling_in_immune_system"
## [10] "reactome_diseases_of_immune_system"
## [11] "reactome_innate_immune_system"
## [12] "reactome_modulation_by_mtb_of_host_immune_system"
## [13] "reactome_regulation_of_innate_immune_responses_to_cytosolic_dna"
## [14] "reactome_runx3_regulates_immune_response_and_cell_migration"
## [15] "reactome_sting_mediated_induction_of_host_immune_responses"
## [16] "reactome_sumoylation_of_immune_response_proteins"
## [17] "wp_control_of_immune_tolerance_by_vasoactive_intestinal_peptide"
## [18] "wp_interactions_between_immune_cells_and_micrornas_in_tumor_microenvironment"
## [19] "wp_mirnas_involvement_in_the_immune_response_in_sepsis"
## [20] "wp_pathways_of_nucleic_acid_metabolism_and_innate_immune_sensing"
## [21] "wp_sarscov2_innate_immunity_evasion_and_cellspecific_immune_response"
## [22] "wp_the_human_immune_response_to_tuberculosis"
## attr(,"browse_msig")
## [1] "immune"
2.2 browse_show_collection
展示MSigDB数据库中的所有collection
browse_show_collection()
## [1] "H" "C1" "C1&chromosome=1" "C1&chromosome=2"
## [5] "C1&chromosome=3" "C1&chromosome=4" "C1&chromosome=5" "C1&chromosome=6"
## [9] "C1&chromosome=7" "C1&chromosome=8" "C1&chromosome=9" "C1&chromosome=10"
## [13] "C1&chromosome=11" "C1&chromosome=12" "C1&chromosome=13" "C1&chromosome=14"
## [17] "C1&chromosome=15" "C1&chromosome=16" "C1&chromosome=17" "C1&chromosome=18"
## [21] "C1&chromosome=19" "C1&chromosome=20" "C1&chromosome=21" "C1&chromosome=22"
## [25] "C1&chromosome=x" "C1&chromosome=y" "C1&chromosome=mt" "C2"
## [29] "CGP" "CP" "CP:BIOCARTA" "CP:KEGG"
## [33] "CP:PID" "CP:REACTOME" "CP:WIKIPATHWAYS" "C3"
## [37] "MIR" "MIR:MIR_Legacy" "MIR:MIRDB" "TFT"
## [41] "TFT:GTRD" "TFT:TFT_Legacy" "C4" "CGN"
## [45] "CM" "C5" "GO" "GO:BP"
## [49] "GO:CC" "GO:MF" "HPO" "C6"
## [53] "C7" "IMMUNESIGDB" "VAX" "C8"
2.3 msig_filt
类似dplyr包里的filter函数
browse_msig("immune") %>%
msig_filt("response") %>%
head(10)
## [1] "galindo_immune_response_to_enterotoxin"
## [2] "gobp_activation_of_immune_response"
## [3] "gobp_activation_of_innate_immune_response"
## [4] "gobp_adaptive_immune_response"
## [5] "gobp_adaptive_immune_response_based_on_somatic_recombination_of_immune_receptors_built_from_immunoglobulin_superfamily_domains"
## [6] "gobp_antifungal_innate_immune_response"
## [7] "gobp_antimicrobial_humoral_immune_response_mediated_by_antimicrobial_peptide"
## [8] "gobp_antiviral_innate_immune_response"
## [9] "gobp_b_cell_activation_involved_in_immune_response"
## [10] "gobp_b_cell_proliferation_involved_in_immune_response"
2.4 msig_gene系列
2.4.1 msig_gene
提取geneset里的基因信息
genes <- msig_gene("hallmark_peroxisome")
## HALLMARK_PEROXISOME 105 members mapped to 104 genes
genes %>%
msig_view()
OriginalMember | NCBI(Entrez)GeneId | GeneSymbol | GeneDescription |
---|---|---|---|
hallmark_peroxisome | |||
ABCB1 | 5243 | ABCB1 | ATP binding cassette subfamily B member … |
ABCB4 | 5244 | ABCB4 | ATP binding cassette subfamily B member … |
ABCB9 | 23457 | ABCB9 | ATP binding cassette subfamily B member … |
ABCC5 | 10057 | ABCC5 | ATP binding cassette subfamily C member … |
ABCC8 | 6833 | ABCC8 | ATP binding cassette subfamily C member … |
ABCD1 | 215 | ABCD1 | ATP binding cassette subfamily D member … |
ABCD2 | 225 | ABCD2 | ATP binding cassette subfamily D member … |
ABCD3 | 5825 | ABCD3 | ATP binding cassette subfamily D member … |
2.4.2 msig_geneSymbol
提取genset的GeneSymbol的列
genes <- msig_geneSymbol("hallmark_peroxisome")
2.5 related_geneset
related_geneset("hallmark_peroxisome")
## $`28 founder gene sets for this hallmark gene set`
## [1] "chr11p"
## [2] "chr15q"
## [3] "gobp_bile_acid_metabolic_process"
## [4] "gobp_hormone_metabolic_process"
## [5] "gobp_peroxisome_organization"
## [6] "gobp_response_to_drug"
## [7] "gobp_steroid_biosynthetic_process"
## [8] "gobp_steroid_metabolic_process"
## [9] "gocc_microbody"
## [10] "gocc_microbody_membrane"
## [11] "gomf_nucleobase_containing_compound_transmembrane_transporter_activity"
## [12] "gomf_protein_c_terminus_binding"
## [13] "kegg_abc_transporters"
## [14] "kegg_peroxisome"
## [15] "kegg_primary_bile_acid_biosynthesis"
## [16] "microbody_part"
## [17] "module_404"
## [18] "peroxisomal_membrane"
## [19] "peroxisomal_part"
## [20] "peroxisome"
## [21] "reactome_abc_family_proteins_mediated_transport"
## [22] "reactome_abc_transporters_in_lipid_homeostasis"
## [23] "reactome_alpha_linolenic_acid_ala_metabolism"
## [24] "reactome_bile_acid_and_bile_salt_metabolism"
## [25] "reactome_peroxisomal_lipid_metabolism"
## [26] "reactome_synthesis_of_bile_acids_and_bile_salts"
## [27] "reactome_synthesis_of_bile_acids_and_bile_salts_via_24_hydroxycholesterol"
## [28] "reactome_synthesis_of_bile_acids_and_bile_salts_via_7alpha_hydroxycholesterol"
##
## attr(,"related_geneset")
## [1] "hallmark_peroxisome"
2.6 similarity_geneset
x <- similarity_geneset('REACTOME_DEGRADATION_OF_AXIN')
x
## External_ID
## 1 R-HSA-169911
## 2 R-HSA-180585
## 3 R-HSA-211733
## 4 R-HSA-69601
## 5 R-HSA-69610
## 6 R-HSA-69613
## 7 R-HSA-75815
## External_Name
## 1 Regulation of Apoptosis
## 2 Vif-mediated degradation of APOBEC3G
## 3 Regulation of activated PAK-2p34 by proteasome mediated degradation
## 4 Ubiquitin Mediated Degradation of Phosphorylated Cdc25A
## 5 p53-Independent DNA Damage Response
## 6 p53-Independent G1/S DNA damage checkpoint
## 7 Ubiquitin-dependent degradation of Cyclin D
## link
## 1 https://www.reactome.org/content/detail/R-HSA-169911
## 2 https://www.reactome.org/content/detail/R-HSA-180585
## 3 https://www.reactome.org/content/detail/R-HSA-211733
## 4 https://www.reactome.org/content/detail/R-HSA-69601
## 5 https://www.reactome.org/content/detail/R-HSA-69610
## 6 https://www.reactome.org/content/detail/R-HSA-69613
## 7 https://www.reactome.org/content/detail/R-HSA-75815
总体看起来msig和msigdbr功能比较类似,但是感觉msigdbr好像更容易记一些。msigdbr通篇就一个函数msigdbr(species = "Homo sapiens", category = "C2", subcategory = "KEGG")
library(msigdbr)
msigdbr(species = "Homo sapiens", category = "C2", subcategory = "KEGG") %>%
head(5)
## # A tibble: 5 x 15
## gs_cat gs_subcat gs_name gene_symbol entrez_gene ensembl_gene human_gene_symb~
## <chr> <chr> <chr> <chr> <int> <chr> <chr>
## 1 C2 CP:KEGG KEGG_A~ ABCA1 19 ENSG0000016~ ABCA1
## 2 C2 CP:KEGG KEGG_A~ ABCA10 10349 ENSG0000015~ ABCA10
## 3 C2 CP:KEGG KEGG_A~ ABCA12 26154 ENSG0000014~ ABCA12
## 4 C2 CP:KEGG KEGG_A~ ABCA13 154664 ENSG0000017~ ABCA13
## 5 C2 CP:KEGG KEGG_A~ ABCA2 20 ENSG0000010~ ABCA2
## # ... with 8 more variables: human_entrez_gene <int>, human_ensembl_gene <chr>,
## # gs_id <chr>, gs_pmid <chr>, gs_geoid <chr>, gs_exact_source <chr>,
## # gs_url <chr>, gs_description <chr>
备注:本内容仅供学习交流,禁止用于商业用途,如有侵权请联系删除!
参考链接:
msig:An R Package for Exploring Molecular Signatures Database
msigdbr: MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format
网友评论