软件名:ngmlr
版本号:ngmlr 0.2.6
1. 软件用途综述
NextGenMap-LR(ngmlr)主要用于三代测序的长reads(PacBio 、Oxford Nanopore)与参考基因组的比对。三代测序产生的reads主要特征有两个:1,读长很长(平均10K);2,高错误率(10% to 15% for PacBio, and 5% to 20% for Oxford Nanopore sequencing)。NGMLR(https://github.com/philres/ngmlr)是一款为长reads设计的快速且高精度的进行比对的软件,它是基于NGM(seed-and-extend short read aligner)开发的,该软件扩展了segmented convex gap-cost scoring model来适应高错误率的长reads比对。
网址:https://github.com/philres/ngmlr
2. 分析原理

- Identifyinitial anchors
- Verifyanchors with vectorized Smith-Waterman algorithm (scores only)
- Filteranchors and find candidate regions for the alignments
- Compute thefull alignment between the read and the respective candidate reference regions
3. 实现方法
3.1 使用示例
ngmlr -r ucsc.hg19.fasta -q XXX.fastq -o YYY.bam
3.2 程序说明
-r 参考基因组
-q待比对三代测序数据
-o 输出文件
3.3软件参数说明
Usage: ngmlr [options] -r
<reference> -q <reads> [-o <output>]
Input/Output:
-
-r , --reference *(所在路径需要可写权限)
(required) Path to the reference genome (FASTA/Q, can begzipped)*
-
-q , --query*
Path to the read file(FASTA/Q) [/dev/stdin]*
-
-o , --output*
Path to output file[stdout]*
Don't write referenceindex to disk [false]*
-
--bam-fix*
Report reads with > 64kCIGAR operations as unmapped. Required to be compatibel to BAM format [false]*
General:
-
-t , --threads*
Number of threads [1]*
-
-x , --presets*
Parameter presets fordifferent sequencing technologies [pacbio]*
-
-i <0-1>, --min-identity <0-1>*
Alignments with anidentity lower than this threshold will be discarded [0.65]*
-
-R , --min-residues*
Alignments containingless than or ( * read length) residues will bediscarded [0.25]*
-
--no-smallinv*
Don't detect smallinversions [false]*
-
--no-lowqualitysplit*
Split alignments withpoor quality [false]*
-
--verbose*
Debug output [false]*
-
--no-progress*
Don't print progress infowhile mapping [false]*
Advanced:
-
--match*
Match score [2]*
-
--mismatch*
Mismatch score [-5]*
-
--gap-open*
Gap open score [-5]*
-
--gap-extend-max*
Gap open extend max [-5]*
-
--gap-extend-min*
Gap open extend min [-1]*
-
--gap-decay*
Gap extend decay [0.15]*
-
-k <10-15>, --kmer-length <10-15>*
K-mer length in bases[13]*
-
--kmer-skip*
Number of k-mers to skipwhen building the lookup table from the reference [2]*
-
--bin-size*
Sets the size of the gridused during candidate search [4]*
-
--max-segments*
Max number of segmentsallowed for a read per kb [1]*
-
--subread-length*
Length of fragments readsare split into [256]*
-
--subread-corridor*
Length of corridorsub-reads are aligned with [40]*
3.4 结果展示及说明
结果以Sam格式展示:

4.资源消耗

5.注意事项
1, 该软件会在参考基因组所在目录下建一个索引,所以参考基因组所在目录需要有可写权限(也可使用--skip-write参数,明确不将index写入磁盘);
2, 为参考基因组建立index会耗用很长时间,建议在database(参考基因组所在文件夹)中建立一套index,每次调用。
6.软件相关文献引用
Accurate detection of complex structural variations using single molecule sequencing
FritzJ Sedlazeck, Philipp Rescheneder, Moritz Smolka, Han Fang, Maria Nattestad, ArndtvonHaeseler, Michael Schatz.bioRxiv169557; doi: https://doi.org/10.1101/169557
7. FAQs
Poster & Talks:
Accurate and fast detection of complex and nested structural variations using long read technologies Biological Data Science, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 26 - 29.10.2016
NGMLR: Highly accurate read mapping of third generationsequencing reads for improved structural variation analysis Genome Informatics 2016, Wellcome Genome Campus Conference Centre, Hinxton, Cambridge, UK,19.09.-2.09.2016
网友评论