busco

作者: 就是大饼 | 来源:发表于2022-05-18 16:28 被阅读0次

    BUSCO——Benchmarking Universal Single-Copy Orthologs 普遍通用的单拷贝直系同源测试,用于评估基因组组装和注释完整性的一个软件。

    其流程是:
    genoem assemble | tBLASTn --> Augustus --> HMMER3
    Transcriptome | Find ORF --> HMMER3
    Gene set | HMMER3

    下载安装

    # 构建conda的python3环境
    conda create --name busco-py3.7 python=3.7
    #  然后激活
    conda activate busco-py3.7
    # 执行安装
    conda install  busco
    

    使用

    说明书如下:

    usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]
    -i FASTA FILE, --in FASTA FILE #序列文件(FASTA格式),可以是组装好的基因组、转录组、蛋白质组
    -c N, --cpu N  # 指定线程
    -o OUTPUT, --out OUTPUT # 输出文件的名称,不加路径
    --out_path OUTPUT_PATH #输出文件的路径(默认当前路径)
    -e N, --evalue N  # 为BLAST的E-value cutoff (格式:0.001 or 1e-03;默认 1e-03)
    -m MODE, --mode MODE  # geno/genome;tran/transcriptome;prot/proteins
    -l LINEAGE, --lineage_dataset LINEAGE # 指定要用的BUSCO lineage(数据库文件夹)
    -f, --force  # 存在文件的强制重写。当输出文件名称已存在时使用
    -r, --restart  # 继续一个有部分已完成的run
    --limit REGION_LIMIT  # 每次BUSCO考虑的候选regions(contig or transcript)数 (默认 3)
    --augustus_species AUGUSTUS_SPECIES # 指定一个物种用于Augustus training.
    --auto-lineage # 跑auto-lineage找到合适的lineage path
    --offline             To indicate that BUSCO cannot attempt to download files
    --config CONFIG_FILE  # 提供一个config file
    -v, --version   # 查看版本
    -h, --help  # 查看帮助信息
    --list-datasets  #打印可用的BUSCO datasets
    

    语法:

    busco -i test.fa -c 8 -o test -m genome -l eudicots_odb10 > output.txt
    

    得到的结果如:
    C:98.1%[S:95.1%,D:3.0%],F:0.6%,M:1.3%,n:2326
    2280 Complete BUSCOs (C)
    2211 Complete and single-copy BUSCOs (S)
    69 Complete and duplicated BUSCOs (D)
    14 Fragmented BUSCOs (F)
    32 Missing BUSCOs (M)
    2326 Total BUSCO groups searched

    画图

    可以用generate_plot.py 画图(多物种的情况下比较好)
    说明书:

    usage: python3 generate_plot.py -wd [WORKING_DIRECTORY] [OTHER OPTIONS]
    
    BUSCO plot generation tool.
    Place all BUSCO short summary files (short_summary.[generic|specific].dataset.label.txt) in a single folder. It will be your working directory, in which the generated plot files will be written
    See also the user guide for additional information
    
    required arguments:
      -wd PATH, --working_directory PATH
                            Define the location of your working directory
    
    optional arguments:
      -rt RUN_TYPE, --run_type RUN_TYPE
                            type of summary to use, `generic` or `specific`
      --no_r                To avoid to run R. It will just create the R script file in the working directory
      -q, --quiet           Disable the info logs, displays only errors
      -h, --help            Show this help message and exit
    

    需要把所有的经过BUSCO检测的结果聚集到一个文件夹之内

    mkdir my_summaries
    cp run_SPEC1/short_summary_SPEC1.txt my_summaries/.
    cp run_SPEC2/short_summary_SPEC2.txt my_summaries/.
    cp run_SPEC3/short_summary_SPEC3.txt my_summaries/.
    cp run_SPEC4/short_summary_SPEC4.txt my_summaries/.
    cp run_SPEC5/short_summary_SPEC5.txt my_summaries/.
    python scripts/generate_plot.py –wd my_summaries
    

    相关文章

      网友评论

        本文标题:busco

        本文链接:https://www.haomeiwen.com/subject/yleaurtx.html