在R中根据基因ID、序列号等对基因进行注释(模式)
下载chimp database
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("org.Pt.eg.db")
library(org.Pt.eg.db)
columns(org.Pt.eg.db)
keytypes(org.Pt.eg.db)
cols<-c("SYMBOL", "GENENAME","GO","PATH")
filename<-c("AA_q0.01","AA_q0.05")
for (singlename in filename){
data<-read.table(singlename,header=F)
result<-select(org.Pt.eg.db,keytype="REFSEQ",columns=cols,keys=as.character(data$V1))
outname<-paste(c(singlename,".annotated"),collapse = "")
data2<-subset(data,select=c(V1,V6,V11,V12))
colnames(data2)<-c("REFSEQ","AA","p_value","q_value")
data3<-merge(result,data2,by="REFSEQ")
write.table(result,outname,quote=F,sep=",")}
filename<-c("BB_q0.01","BB_q0.05")
for (singlename in filename){
data<-read.table(singlename,header=F)
result<-select(org.Pt.eg.db,keytype="REFSEQ",columns=cols,keys=as.character(data$V1))
outname<-paste(c(singlename,".annotated"),collapse = "")
data2<-subset(data,select=c(V1,V7,V11,V12))
colnames(data2)<-c("REFSEQ","BB","p_value","q_value")
data3<-merge(result,data2,by="REFSEQ")
write.table(result,outname,quote=F,sep=",")}
简单来说就是用org.Pt.eg.db这个包,也是个库,来注释chimpanzee对基因信息。人类就应该是org.Hs.eg.db。其他具体模式生物有没有对应的R包,需要去查一下。
很多R语言注释都是用这种形式,比如Y叔的clusterProfiler(如果我没记错)。
参考:
https://www.bioconductor.org/help/course-materials/2015/UseBioconductorFeb2015/A01.5_Annotation.html
网友评论