参数解释
![](https://img.haomeiwen.com/i27731909/b2abf608ad27f1fb.png)
columns
展示来自AnnotationDb
对象中哪个类型的数据可以被返回。keytypes
允许使用者发现哪种keytypes
可以被传递到select
或者keys
和keytype
参数keys
返回包含在AnnotationDb对象中的数据库的keys![](https://img.haomeiwen.com/i27731909/949a0077a26fde97.png)
select
将根据所选keys columns
和keytype
参数以data.frame的形式检索数据。需要注意的是,如果您调用select
并请求与keys有多个匹配项的列,select
将返回一个data.frame,每个可能的匹配项对应一行。这样做的效果是,如果您请求多个列,并且其中一些列与键具有多对一的关系,那么它们将继续相应地相乘。因此,请求大量列不是一个好主意,除非您知道所请求的内容应该与初始密钥集有一对一的关系。通常,如果需要检索与原始键具有多对一关系的列(如GO),则单独提取该列是最有用的。mapIds获取特定键类型的一组键的映射id(列)。通常作为命名字符向量返回。
涉及到的函数
![](https://img.haomeiwen.com/i27731909/614986dea0739abb.png)
示例
require(hugene10sttranscriptcluster.db)
class(hugene10sttranscriptcluster.db)
#annotationDb对象,用于探针注释的R包
[1] "ChipDb"
attr(,"package")
[1] "AnnotationDbi"
columns
函数 可以返回从数据库中提取到的列,返回值为字符串
columns(hugene10sttranscriptcluster.db)
[1] "ACCNUM" "ALIAS"
[3] "ENSEMBL" "ENSEMBLPROT"
[5] "ENSEMBLTRANS" "ENTREZID"
[7] "ENZYME" "EVIDENCE"
[9] "EVIDENCEALL" "GENENAME"
[11] "GENETYPE" "GO"
[13] "GOALL" "IPI"
[15] "MAP" "OMIM"
[17] "ONTOLOGY" "ONTOLOGYALL"
[19] "PATH" "PFAM"
[21] "PMID" "PROBEID"
[23] "PROSITE" "REFSEQ"
[25] "SYMBOL" "UCSCKG"
[27] "UNIPROT"
keys
返回所有可能的keys,返回值为字符串
这里的keys返回的是探针ID
keys <- head( keys(hugene10sttranscriptcluster.db) )
keys
[1] "7892501" "7892502" "7892503" "7892504"
[5] "7892505" "7892506"
keytypes
允许使用者发现哪种keytypes
可以被传递到select
或者keys
和keytype
参数
通俗讲,就是返回支持的基因ID类型
keytypes(hugene10sttranscriptcluster.db)
[1] "ACCNUM" "ALIAS"
[3] "ENSEMBL" "ENSEMBLPROT"
[5] "ENSEMBLTRANS" "ENTREZID"
[7] "ENZYME" "EVIDENCE"
[9] "EVIDENCEALL" "GENENAME"
[11] "GENETYPE" "GO"
[13] "GOALL" "IPI"
[15] "MAP" "OMIM"
[17] "ONTOLOGY" "ONTOLOGYALL"
[19] "PATH" "PFAM"
[21] "PMID" "PROBEID"
[23] "PROSITE" "REFSEQ"
[25] "SYMBOL" "UCSCKG"
[27] "UNIPROT"
select
查看探针对应的基因symbol和基因类型
select(hugene10sttranscriptcluster.db, keys=keys, columns = c("SYMBOL","GENETYPE"))
'select()' returned 1:1 mapping between
keys and columns
PROBEID SYMBOL GENETYPE
1 7892501 <NA> <NA>
2 7892502 <NA> <NA>
3 7892503 <NA> <NA>
4 7892504 <NA> <NA>
5 7892505 <NA> <NA>
6 7892506 <NA> <NA>
### 查看这个注释包中的前几个NCBI的参考序列(refseq)
keyref <- head( keys(hugene10sttranscriptcluster.db, keytype="REFSEQ") )
> keyref
[1] "NM_130786" "NP_570602"
[3] "NM_000014" "NM_001347423"
[5] "NM_001347424" "NM_001347425"
mapIds
进行探针比对
#比对不到的探针会显示NA,所以在这里我对NA值进行去除,并返回前5个比对到的探针与相应的gene symbol
na.omit(mapIds(hugene10sttranscriptcluster.db, keys=keys, column='SYMBOL', keytype='PROBEID'))[1:5]
'select()' returned 1:many mapping
between keys and columns
7896740 7896742 7896744
"OR4F4" "PCMTD2" "OR4F29"
7896746 7896754
"MTND1P23" "SEPTIN7P13"
### 进一步整理为数据框方便操作
tt=data.frame(probe_id=names(na.omit(mapIds(hugene10sttranscriptcluster.db, keys=keys, column='SYMBOL', keytype='PROBEID'))),symbol=na.omit(mapIds(hugene10sttranscriptcluster.db, keys=keys, column='SYMBOL', keytype='PROBEID')))
### 查看注释数据框前几行
head(tt)
probe_id symbol
7896740 7896740 OR4F4
7896742 7896742 PCMTD2
7896744 7896744 OR4F29
7896746 7896746 MTND1P23
7896754 7896754 SEPTIN7P13
7896756 7896756 FAM87B
以上就是这个包用于探针注释的内容了。
网友评论