构建自己物种的orgDb

作者: MLD_TRNA | 来源:发表于2021-06-09 13:38 被阅读0次

    1.首先安装eggnog-mapper软件

    注释所需要的物种数据库网址如下,同时也可以用里面的脚本download_eggnog_data.py下载你所需要的数据库:

    http://eggnogdb.embl.de/download/
    
    
    python download_eggnog_data.py euk 下载euk数据库
    
    

    eggnog-mapper有两种比对方式(直接调用emapper.py脚本即可):

    1. 基于hmmer的比对:建议序列少于1000条
    $python /data1/spider/ytbiosoft/soft/eggnog-mapper-1.0.3/emapper.py -m hmmer -i test.fasta -d euk -o test_euk(输出文件前缀)
    
    
    1. 基于diamond的比对:序列大于1000条(不需要指定数据库)
    $python /data1/spider/ytbiosoft/soft/eggnog-mapper-1.0.3/emapper.py -m diamond -i 你的物种所有蛋白序列 -o sesame(输出文件前缀)
    
    

    2. 对生成的文件修改

    结果会生成一个sesame.emapper.annotations的文件。查看文件会发现有许多以#开头的行,要删掉这些没用的行。注意别删掉表头。

    image

    所以需要删掉#开头的行以及表头的#,但不要删表头

    $sed -i 's/#//' sesame.emapper.annotations  -i就在源文件修改 s替换 /空字符
    
    

    此时的sesame.emapper.annotations就可以拿来构建orgDb了。

    3. 根据eggnog-mapper注释结果构建orgDb

    1. 安装R包
    library(tidyverse)
    library(stringr)
    library(KEGGREST)
    library(AnnotationForge)
    
    

    除了KEGGREST以外的三个都可以用install.packages()安装

    >if (!requireNamespace("BiocManager", quietly = TRUE))
        install.packages("BiocManager")
    
    >BiocManager::install("KEGGREST")
    
    

    安装好之后就可以构建自己的orgDb了

    1. 构建orgDb
    library(tidyverse)
    library(stringr)
    library(KEGGREST)
    library(AnnotationForge)
    
    #' Title
    #'
    #' @param f_emapper_anno eggnog-mapper annotation result
    #' @param author Who is the creator of this package? like "xxx <xxx@xxx.xx>"
    #' @param tax_id The Taxonomy ID that represents your organism. (NCBI has a nice online browser for finding the one you need)
    #' @param genus Single string indicating the genus
    #' @param species Single string indicating the species
    #'
    #' @return OrgDb name
    #' @export
    #'
    #' @examples
    makeOrgPackageFromEmapper <- function(f_emapper_anno,
                                          author,
                                          tax_id = "0",
                                          genus = "default",
                                          species = "default") {
    
      # read emapper result
      emapper <- read_delim(f_emapper_anno,
                            "\t", escape_double = FALSE, trim_ws = TRUE)
    
      # extract gene name from emapper
      gene_info <- emapper %>%
        dplyr::select(GID = query_name, GENENAME = `eggNOG annot`) %>%
        na.omit()
    
      # extract go annotation from emapper
      gos <- emapper %>%
        dplyr::select(query_name, GO_terms) %>%
        na.omit()
    
      gene2go = data.frame(GID = character(),
                           GO = character(),
                           EVIDENCE = character())
    
      for (row in 1:nrow(gos)) {
        the_gid <- gos[row, "query_name"][[1]]
        the_gos <- str_split(gos[row,"GO_terms"], ",", simplify = FALSE)[[1]]
    
        df_temp <- data_frame(GID = rep(the_gid, length(the_gos)),
                              GO = the_gos,
                              EVIDENCE = rep("IEA", length(the_gos)))
        gene2go <- rbind(gene2go, df_temp)
      }
    
      # extract kegg pathway annotation from emapper
      gene2ko <- emapper %>%
        dplyr::select(GID = query_name, Ko = KEGG_KOs) %>%
        na.omit()
    
      load(file = "kegg_info.RData")
      gene2pathway <- gene2ko %>% left_join(ko2pathway, by = "Ko") %>%
        dplyr::select(GID, Pathway) %>%
        na.omit()
    
      # make OrgDb
      makeOrgPackage(gene_info=gene_info,
                     go=gene2go,
                     ko=gene2ko,
                     pathway=gene2pathway,
                     # gene2pathway=gene2pathway,
                     version="0.0.2",
                     maintainer=author,
                     author=author,
                     outputDir = ".",
                     tax_id=tax_id,
                     genus=genus,
                     species=species,
                     goTable="go")
    
      my_orgdb <- str_c("org.", str_to_upper(str_sub(genus, 1, 1)) , species, ".eg.db", sep = "")
      return(my_orgdb)
    }
    
    my_orgdb <- makeOrgPackageFromEmapper("input/sesame.emapper.annotations",
                                          "zhangxudong <zhangxudong@genek.tv>",
                                          tax_id = "4182",
                                          genus = "Sesamum",
                                          species = "indicum")
    
    

    跑完代码就会生成一个org.Sindicum.eg.db的文件夹。此时就可以在Rstiduo里面安装这个包了。

    更多精彩内容,就在简书APP

    相关文章

      网友评论

        本文标题:构建自己物种的orgDb

        本文链接:https://www.haomeiwen.com/subject/kwznjltx.html