QUAST是评估基因组组装质量的常用工具,可计算N50等contig基本信息(without reference),也可通过比对参考基因组计算fraction, duplication, misassembly, unaligned, mismatch等信息(reference-based)。之后推出的metaquast可通过与close reference比较评估宏基因组组装质量。
文章:
文章1:QUAST: quality assessment tool for genome assemblies. Bioinformatics 2013
引用:3510
文章2:MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 2016
引用:233
方法:
主页:http://bioinf.spbau.ru/quast
github: https://github.com/ablab/quast
sourceforge: http://quast.sourceforge.net/
sourceforge: http://quast.sourceforge.net/quast
quast5 更新:QUAST v.5.1.0 release notes (public version)
quast5 github下载:quast_5.1.0rc1
quast5 manual: http://quast.sourceforge.net/docs/manual.html
metaquast主页:http://bioinf.spbau.ru/metaquast
metaqusat sourceforge:http://quast.sourceforge.net/metaquast
下载,安装:
可执行文件,免安装,爱了
wget -c https://github.com/ablab/quast/releases/download/quast_5.1.0rc1/quast-5.1.0rc1.tar.gz
tar -zxvf quast-5.1.0rc1.tar.gz
python quast.py --help
python quast.py --version
# QUAST v5.1.0rc1, 6260eff0
运行:
# 使用测试数据
python quast.py test_data/contigs_1.fasta \
test_data/contigs_2.fasta \
-r test_data/reference.fasta.gz \
-g test_data/genes.txt \
-1 test_data/reads1.fastq.gz -2 test_data/reads2.fastq.gz \
-o quast_test_output
# 实战:有参使用QUAST
quast_route="/software/quast-5.1.0rc1"
python $quast_route/quast.py AF04-12.fna \
-r ../Prokka/bgi/AF04-12/AF04-12.fna \
-g ../Prokka/bgi/AF04-12/AF04-12.gff \
--fragmented \
-t 4 -o ./AF04-12/
# 造个轮子,批量QUAST,bgi vs illumina
for i in `cat 76_strain_id.list`;
do
python $quast_route/quast.py Prokka/illumina/$i/$i.fna \
-r Prokka/bgi/$i/$i.fna \
-g Prokka/bgi/$i/$i.gff \
--fragmented --silent \
-t 2 -o QUAST/illumina/$i/
echo -e "\033[32m $i done...\033[0m"
done
input contig
-r: reference fasta
-g: reference gff file
--fragmented: detect misassemblies caused and mark them fake
-1/-2: forward/reverse reads
-o: output dir
data:image/s3,"s3://crabby-images/349fa/349fa4d78053bcc57e4788dd03a741eb91e0c5c3" alt=""
这里使用的python没有装matplotlib模块,结果无pdf,无关紧要,我们要report.txt就够了。
结果文件:
data:image/s3,"s3://crabby-images/451c8/451c8c5c096a281091fa858ceda8751f1291aa97" alt=""
report.txt summary table
report.tsv tab-separated version, for parsing, or for spreadsheets (Google Docs, Excel, etc)
report.tex Latex version
report.pdf PDF version, includes all tables and plots for some statistics
report.html everything in an interactive HTML file
icarus.html Icarus main menu with links to interactive viewers
contigs_reports/ [only if a reference genome is provided]
misassemblies_report detailed report on misassemblies
unaligned_report detailed report on unaligned and partially unaligned contigs
k_mer_stats/ [only if --k-mer-stats is specified]
kmers_report detailed report on k-mer-based metrics
reads_stats/ [only if reads are provided]
reads_report detailed report on mapped reads statistics
结果文件:report.html
data:image/s3,"s3://crabby-images/c9cbe/c9cbe5e2904b393cf563752a0dbeabaf41966c0e" alt=""
see manual for more detail: http://quast.sourceforge.net/docs/manual.html
- Genome fraction (%)
is the percentage of aligned bases in the reference genome. - N's per 100 kbp
is the average number of uncalled bases (N's) per 100000 assembly bases. - mismatches per 100 kbp
is the average number of mismatches per 100000 aligned bases. True SNPs and sequencing errors are not distinguished and are counted equally. - indels per 100 kbp
is the average number of indels per 100000 aligned bases. Several consecutive single nucleotide indels are counted as one indel.
data:image/s3,"s3://crabby-images/89889/89889c036228e9c1ac289733d2e641cf72bc1b8b" alt=""
批处理,结果整理
## QUEST结果统计
task="bgi"
touch QUAST/${task}_quast.txt
cat QUAST/bgi/AF04-12/transposed_report.tsv | sed -n '1p' >> QUAST/${task}_quast.txt
for i in `cat 76_strain_id.list`;
do
cat QUAST/${task}/$i/transposed_report.tsv | sed -n '2p' >> QUAST/${task}_quast.txt
echo -e "\033[32m $i done... \033[0m"
done
data:image/s3,"s3://crabby-images/1e0f4/1e0f483c4c797a338cfcf4cd4490ea58683d6be4" alt=""
更多:
quast 的结果怎么看
网友评论