美文网首页转录组重点关注
R中实现基因别名的规范化【转载】

R中实现基因别名的规范化【转载】

作者: 超级可爱的懂事长鸭 | 来源:发表于2022-02-02 23:13 被阅读0次

突然间发现大名鼎鼎的R包limma居然有一个函数是alias2Symbol,我看了看它的描述,其功能是Convert Gene Aliases to Official Gene Symbols

虽然我还没有使用过它,但是却一直期盼着这样的功能,它 包括以下3个函数:

alias2Symbol(alias, species = "Hs", expand.symbols = FALSE)
alias2SymbolTable(alias, species = "Hs")
alias2SymbolUsingNCBI(alias, gene.info.file,
                      required.columns = c("GeneID","Symbol","description"))

Details
Aliases are mapped via NCBI Entrez Gene identity numbers using Bioconductor organism packages.
alias2Symbol maps a set of aliases to a set of symbols, without necessarily preserving order. The output vector may be longer or shorter than the original vector, because some aliases might not be found and some aliases may map to more than one symbol.
alias2SymbolTable returns of vector of the same length as the vector of aliases. If an alias maps to more than one symbol, then the one with the lowest Entrez ID number is returned. If an alias can't be mapped, then NA is returned.

给出来的示例是:

alias2Symbol(c("PUMA","NOXA","BIM"), species="Hs")
alias2Symbol("RS1", expand=TRUE)

确实是非常实用,尤其是如果一个生信工程师跟湿实验科学家合作的时候,湿实验科学家很喜欢给出自认为很正常的基因名字,比如 PD1 和 PDL1 ,然后我们就需要转换它们,如下所示:

> alias2Symbol("PD1", expand=TRUE)
[1] "PDCD1"  "SNCA"   "SPATA2"
> alias2Symbol("PDL1", expand=TRUE)
[1] "CD274"

当然了,它不仅仅是支持人类这个物种, 其实它这个函数主要是从 org 系列的包里面去摘取信息 ,包括:

Package Species
org.Ag.eg.db Anopheles
org.Bt.eg.db Bovine
org.Ce.eg.db Worm
org.Cf.eg.db Canine
org.Dm.eg.db Fly
org.Dr.eg.db Zebrafish
org.EcK12.eg.db E coli strain K12
org.EcSakai.eg.db E coli strain Sakai
org.Gg.eg.db Chicken
org.Hs.eg.db Human
org.Mm.eg.db Mouse
org.Mmu.eg.db Rhesus
org.Pt.eg.db Chimp
org.Rn.eg.db Rat
org.Ss.eg.db Pig
org.Xl.eg.db Xenopus

其实归根结底是 ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO 里面的信息,比如:

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz.
是不是很方便啊!

原文来源https://mp.weixin.qq.com/s/a-At-yDJBkw_EaAYj0sA7A

相关文章

网友评论

    本文标题:R中实现基因别名的规范化【转载】

    本文链接:https://www.haomeiwen.com/subject/xkfdkrtx.html