三代比对软件-ngmlr

作者: bioYIYI | 来源:发表于2020-04-19 23:10 被阅读0次

软件名:ngmlr

版本号:ngmlr 0.2.6

1. 软件用途综述

NextGenMap-LR(ngmlr)主要用于三代测序的长reads(PacBio 、Oxford Nanopore)与参考基因组的比对。三代测序产生的reads主要特征有两个:1,读长很长(平均10K);2,高错误率(10% to 15% for PacBio, and 5% to 20% for Oxford Nanopore sequencing)。NGMLR(https://github.com/philres/ngmlr)是一款为长reads设计的快速且高精度的进行比对的软件,它是基于NGM(seed-and-extend short read aligner)开发的,该软件扩展了segmented convex gap-cost scoring model来适应高错误率的长reads比对。

网址:https://github.com/philres/ngmlr

2. 分析原理

image
  1. Identifyinitial anchors
  2. Verifyanchors with vectorized Smith-Waterman algorithm (scores only)
  3. Filteranchors and find candidate regions for the alignments
  4. Compute thefull alignment between the read and the respective candidate reference regions

3. 实现方法

3.1 使用示例

  ngmlr -r ucsc.hg19.fasta -q XXX.fastq -o YYY.bam

3.2 程序说明

-r 参考基因组

-q待比对三代测序数据

-o 输出文件

3.3软件参数说明

Usage: ngmlr [options] -r

<reference> -q <reads> [-o <output>]

Input/Output:

  • -r , --reference *(所在路径需要可写权限)

  •    (required)  Path to the reference genome (FASTA/Q, can begzipped)*
    
  • -q , --query*

  •    Path to the read file(FASTA/Q) [/dev/stdin]*
    
  • -o , --output*

  •    Path to output file[stdout]*
    
  • --skip-write*

  •    Don't write referenceindex to disk [false]*
    
  • --bam-fix*

  •    Report reads with > 64kCIGAR operations as unmapped. Required to be compatibel to BAM format [false]*
    

General:

  • -t , --threads*

  •    Number of threads [1]*
    
  • -x , --presets*

  •    Parameter presets fordifferent sequencing technologies [pacbio]*
    
  • -i <0-1>, --min-identity <0-1>*

  •    Alignments with anidentity lower than this threshold will be discarded [0.65]*
    
  • -R , --min-residues*

  •    Alignments containingless than or ( * read length) residues will bediscarded [0.25]*
    
  • --no-smallinv*

  •    Don't detect smallinversions [false]*
    
  • --no-lowqualitysplit*

  •    Split alignments withpoor quality [false]*
    
  • --verbose*

  •    Debug output [false]*
    
  • --no-progress*

  •    Don't print progress infowhile mapping [false]*
    

Advanced:

  • --match*

  •    Match score [2]*
    
  • --mismatch*

  •    Mismatch score [-5]*
    
  • --gap-open*

  •    Gap open score [-5]*
    
  • --gap-extend-max*

  •    Gap open extend max [-5]*
    
  • --gap-extend-min*

  •    Gap open extend min [-1]*
    
  • --gap-decay*

  •    Gap extend decay [0.15]*
    
  • -k <10-15>, --kmer-length <10-15>*

  •    K-mer length in bases[13]*
    
  • --kmer-skip*

  •    Number of k-mers to skipwhen building the lookup table from the reference [2]*
    
  • --bin-size*

  •    Sets the size of the gridused during candidate search [4]*
    
  • --max-segments*

  •    Max number of segmentsallowed for a read per kb [1]*
    
  • --subread-length*

  •    Length of fragments readsare split into [256]*
    
  • --subread-corridor*

  •    Length of corridorsub-reads are aligned with [40]*
    

3.4 结果展示及说明

结果以Sam格式展示:


image.png

4.资源消耗

image.png

5.注意事项

1, 该软件会在参考基因组所在目录下建一个索引,所以参考基因组所在目录需要有可写权限(也可使用--skip-write参数,明确不将index写入磁盘);
2, 为参考基因组建立index会耗用很长时间,建议在database(参考基因组所在文件夹)中建立一套index,每次调用。

6.软件相关文献引用

Accurate detection of complex structural variations using single molecule sequencing

FritzJ Sedlazeck, Philipp Rescheneder, Moritz Smolka, Han Fang, Maria Nattestad, ArndtvonHaeseler, Michael Schatz.bioRxiv169557; doi: https://doi.org/10.1101/169557

7. FAQs

Poster & Talks:

Accurate and fast detection of complex and nested structural variations using long read technologies Biological Data Science, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 26 - 29.10.2016

NGMLR: Highly accurate read mapping of third generationsequencing reads for improved structural variation analysis Genome Informatics 2016, Wellcome Genome Campus Conference Centre, Hinxton, Cambridge, UK,19.09.-2.09.2016

相关文章

网友评论

    本文标题:三代比对软件-ngmlr

    本文链接:https://www.haomeiwen.com/subject/qyqgbhtx.html