将组装错误分为两类,小规模(< 50 bp)和结构误差(≥50 bp)。小规模包括三种类型:碱基替换、小坍缩和小膨胀。。小规模错误可以直接从序列比对的结果中推断出来,并根据比对上的错误的读取数进行过滤(“方法”)。我们还定义了四种类型的结构装配错误:膨胀、折叠、单倍型切换和反转。折叠和膨胀可能发生在重复区域内,因为重复单元的存在通常会在装配图上形成分叉路径,这很难解决。单倍型切换发生在杂合 SV 断点处,此时两个单倍型不同。汇编器无法重建任一单倍型,而是在两个单倍型之间生成一个序列。在这些情况下,来自一个单倍型的读数将表现“坍缩”,而来自另一个单倍型的读数将表现“扩展”。当序列的一部分在组装中倒置时,就会发生反转。

Figure S1 IGV views of examples of small-scale assembly errors. There are discrepancies between the contig and the majority of reads in base substitution (a), small expansion (b), and small collapse (c).

Figure S2 Examples of structural assembly errors. a An insertion-like pattern in read alignment representing a collapse error, as this part of sequence is collapsed in the contig. b A deletion-like pattern in read alignment representing an expansion error, as these sequences in contig are expanded and not present in the reads. c An insertion-like pattern in half of the reads and a deletion-like pattern in the other half of the reads representing a haplotype switch, as the contig is different from both haplotypes at this heterozygous region. d Inverted alignment within reads representing as an inversion error.
网友评论