美文网首页生信生物信息学基因组
基因组比对工具NGMLR和结构变异识别工具Sniffles

基因组比对工具NGMLR和结构变异识别工具Sniffles

作者: Boer223 | 来源:发表于2019-06-05 16:47 被阅读36次

    前言

    基因组结构变异是很多癌症、遗传病等疾病的重要诱因。目前基于二代测序技术检测基因组结构变异存在很大的局限性,而三代测序存在错误率较高等多种问题,尤其针对复杂结构变异大多软件识别能力较差。针对这一问题,有研究人员就开发出了基因组比对工具NGMLR和结构变异识别工具Sniffles,为变异检测提供了前所未有的灵敏度和精确度,并且NGMLR和Sniffles可以自动过滤虚假事件并对低覆盖率数据进行操作,从而降低成本。

    简介

    NGMLR和Sniffles是适用于长读长测序的新型结构变异检测工具,基因组比对工具NGMLR在基于短read比对方法的基础上,考虑了PacBio和Oxford Nanopore平台产生的数据类型。结构变异识别工具Sniffles是一款结构变异识别工具,可以根据比对结果进行扫描,精确检测出结构变异。


    NGMLR(左)和Sniffles(右)的主要步骤

    NGMLR

    安装

    推荐使用conda进行安装:

    conda install ngmlr
    

    使用

    对于Pacbio数据:

    ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam
    

    对于Oxford Nanopore数据:

    ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam -x ont
    

    参数说明

    用法:ngmlr [options] -r <reference> -q <reads> [-o <output>]

    输入/输出参数:
        -r <file>,  --reference <file>
            (required)  Path to the reference genome (FASTA/Q, can be gzipped)
        -q <file>,  --query <file>
            Path to the read file (FASTA/Q) [/dev/stdin]
        -o <string>,  --output <string>
            Path to output file [stdout]
        --skip-write
            Don't write reference index to disk [false]
        --bam-fix
            Report reads with > 64k CIGAR operations as unmapped. Required to be compatible with the BAM format [false]
        --rg-id <string>
            Adds RG:Z:<string> to all alignments in SAM/BAM [none]
        --rg-sm <string>
            RG header: Sample [none]
        --rg-lb <string>
            RG header: Library [none]
        --rg-pl <string>
            RG header: Platform [none]
        --rg-ds <string>
            RG header: Description [none]
        --rg-dt <string>
            RG header: Date (format: YYYY-MM-DD) [none]
        --rg-pu <string>
            RG header: Platform unit [none]
        --rg-pi <string>
            RG header: Median insert size [none]
        --rg-pg <string>
            RG header: Programs [none]
        --rg-cn <string>
            RG header: sequencing center [none]
        --rg-fo <string>
            RG header: Flow order [none]
        --rg-ks <string>
            RG header: Key sequence [none]
    
    一般参数:
        -t <int>,  --threads <int>
            Number of threads [1]
        -x <pacbio, ont>,  --presets <pacbio, ont>
            Parameter presets for different sequencing technologies [pacbio]
        -i <0-1>,  --min-identity <0-1>
            Alignments with an identity lower than this threshold will be discarded [0.65]
        -R <int/float>,  --min-residues <int/float>
            Alignments containing less than <int> or (<float> * read length) residues will be discarded [0.25]
        --no-smallinv
            Don't detect small inversions [false]
        --no-lowqualitysplit
            Split alignments with poor quality [false]
        --verbose
            Debug output [false]
        --no-progress
            Don't print progress info while mapping [false]
    
    高级参数:
        --match <float>
            Match score [2]
        --mismatch <float>
            Mismatch score [-5]
        --gap-open <float>
            Gap open score [-5]
        --gap-extend-max <float>
            Gap open extend max [-5]
        --gap-extend-min <float>
            Gap open extend min [-1]
        --gap-decay <float>
            Gap extend decay [0.15]
        -k <10-15>,  --kmer-length <10-15>
            K-mer length in bases [13]
        --kmer-skip <int>
            Number of k-mers to skip when building the lookup table from the reference [2]
        --bin-size <int>
            Sets the size of the grid used during candidate search [4]
        --max-segments <int>
            Max number of segments allowed for a read per kb [1]
        --subread-length <int>
            Length of fragments reads are split into [256]
        --subread-corridor <int>
            Length of corridor sub-reads are aligned with [40]
    

    Sniffles

    安装

    推荐使用conda进行安装:

    conda install sniffles
    

    使用

    sniffles -m mapped.sort.bam -v output.vcf
    

    mapped.sort.bam可以来自ngmlr或bwa,如果是来自bwa,要使用-M参数标记出主要和次要比对。

    参考

    • Sedlazeck F J , Rescheneder P , Smolka M , et al. Accurate detection of complex structural variations using single-molecule sequencing[J]. Nature Methods, 2018.
    • Sniffles
    • NGMLR

    相关文章

      网友评论

        本文标题:基因组比对工具NGMLR和结构变异识别工具Sniffles

        本文链接:https://www.haomeiwen.com/subject/fmfjxctx.html