BUSCO——Benchmarking Universal Single-Copy Orthologs 普遍通用的单拷贝直系同源测试,用于评估基因组组装和注释完整性的一个软件。
其流程是:
genoem assemble | tBLASTn --> Augustus --> HMMER3
Transcriptome | Find ORF --> HMMER3
Gene set | HMMER3
下载安装
# 构建conda的python3环境
conda create --name busco-py3.7 python=3.7
# 然后激活
conda activate busco-py3.7
# 执行安装
conda install busco
使用
说明书如下:
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]
-i FASTA FILE, --in FASTA FILE #序列文件(FASTA格式),可以是组装好的基因组、转录组、蛋白质组
-c N, --cpu N # 指定线程
-o OUTPUT, --out OUTPUT # 输出文件的名称,不加路径
--out_path OUTPUT_PATH #输出文件的路径(默认当前路径)
-e N, --evalue N # 为BLAST的E-value cutoff (格式:0.001 or 1e-03;默认 1e-03)
-m MODE, --mode MODE # geno/genome;tran/transcriptome;prot/proteins
-l LINEAGE, --lineage_dataset LINEAGE # 指定要用的BUSCO lineage(数据库文件夹)
-f, --force # 存在文件的强制重写。当输出文件名称已存在时使用
-r, --restart # 继续一个有部分已完成的run
--limit REGION_LIMIT # 每次BUSCO考虑的候选regions(contig or transcript)数 (默认 3)
--augustus_species AUGUSTUS_SPECIES # 指定一个物种用于Augustus training.
--auto-lineage # 跑auto-lineage找到合适的lineage path
--offline To indicate that BUSCO cannot attempt to download files
--config CONFIG_FILE # 提供一个config file
-v, --version # 查看版本
-h, --help # 查看帮助信息
--list-datasets #打印可用的BUSCO datasets
语法:
busco -i test.fa -c 8 -o test -m genome -l eudicots_odb10 > output.txt
得到的结果如:
C:98.1%[S:95.1%,D:3.0%],F:0.6%,M:1.3%,n:2326
2280 Complete BUSCOs (C)
2211 Complete and single-copy BUSCOs (S)
69 Complete and duplicated BUSCOs (D)
14 Fragmented BUSCOs (F)
32 Missing BUSCOs (M)
2326 Total BUSCO groups searched
画图
可以用generate_plot.py 画图(多物种的情况下比较好)
说明书:
usage: python3 generate_plot.py -wd [WORKING_DIRECTORY] [OTHER OPTIONS]
BUSCO plot generation tool.
Place all BUSCO short summary files (short_summary.[generic|specific].dataset.label.txt) in a single folder. It will be your working directory, in which the generated plot files will be written
See also the user guide for additional information
required arguments:
-wd PATH, --working_directory PATH
Define the location of your working directory
optional arguments:
-rt RUN_TYPE, --run_type RUN_TYPE
type of summary to use, `generic` or `specific`
--no_r To avoid to run R. It will just create the R script file in the working directory
-q, --quiet Disable the info logs, displays only errors
-h, --help Show this help message and exit
需要把所有的经过BUSCO检测的结果聚集到一个文件夹之内
mkdir my_summaries
cp run_SPEC1/short_summary_SPEC1.txt my_summaries/.
cp run_SPEC2/short_summary_SPEC2.txt my_summaries/.
cp run_SPEC3/short_summary_SPEC3.txt my_summaries/.
cp run_SPEC4/short_summary_SPEC4.txt my_summaries/.
cp run_SPEC5/short_summary_SPEC5.txt my_summaries/.
python scripts/generate_plot.py –wd my_summaries
网友评论