导读
用biomaRt获取ensembl数据库各物种基因染色体位置信息。
R包地址
biomaRt:
https://m.ensembl.org/info/data/biomart/biomart_r_package.html
安装
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("biomaRt")
查看数据库
listMarts()
查看基因组
mart = useMart('ensembl')
list_db = listDatasets(mart)
包含hsp(人),共202个基因组信息
获取人基因组基因长度信息
可选USA/China Shenzhen的mirror,
解决Ensembl site unresponsive, trying uswest mirror
# 获取ensembl hsp基因组全信息
hsp <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL",
dataset = "hsapiens_gene_ensembl",
host = "www.ensembl.org")
# 提取设置
attributes = c(
"ensembl_gene_id",
"hgnc_symbol",
"chromosome_name",
"start_position",
"end_position"
)
# 提取
hsp_info <- getBM(attributes = attributes, mart = hsp)
head(hsp_info)
各染色体基因数
除了22对常染色体,XY,MT还有大量未知编号
data.frame(table(hsp_info$chromosome_name))
网友评论