前言
基因组结构变异是很多癌症、遗传病等疾病的重要诱因。目前基于二代测序技术检测基因组结构变异存在很大的局限性,而三代测序存在错误率较高等多种问题,尤其针对复杂结构变异大多软件识别能力较差。针对这一问题,有研究人员就开发出了基因组比对工具NGMLR和结构变异识别工具Sniffles,为变异检测提供了前所未有的灵敏度和精确度,并且NGMLR和Sniffles可以自动过滤虚假事件并对低覆盖率数据进行操作,从而降低成本。
简介
NGMLR和Sniffles是适用于长读长测序的新型结构变异检测工具,基因组比对工具NGMLR在基于短read比对方法的基础上,考虑了PacBio和Oxford Nanopore平台产生的数据类型。结构变异识别工具Sniffles是一款结构变异识别工具,可以根据比对结果进行扫描,精确检测出结构变异。
NGMLR(左)和Sniffles(右)的主要步骤
NGMLR
安装
推荐使用conda进行安装:
conda install ngmlr
使用
对于Pacbio数据:
ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam
对于Oxford Nanopore数据:
ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam -x ont
参数说明
用法:ngmlr [options] -r <reference> -q <reads> [-o <output>]
输入/输出参数:
-r <file>, --reference <file>
(required) Path to the reference genome (FASTA/Q, can be gzipped)
-q <file>, --query <file>
Path to the read file (FASTA/Q) [/dev/stdin]
-o <string>, --output <string>
Path to output file [stdout]
--skip-write
Don't write reference index to disk [false]
--bam-fix
Report reads with > 64k CIGAR operations as unmapped. Required to be compatible with the BAM format [false]
--rg-id <string>
Adds RG:Z:<string> to all alignments in SAM/BAM [none]
--rg-sm <string>
RG header: Sample [none]
--rg-lb <string>
RG header: Library [none]
--rg-pl <string>
RG header: Platform [none]
--rg-ds <string>
RG header: Description [none]
--rg-dt <string>
RG header: Date (format: YYYY-MM-DD) [none]
--rg-pu <string>
RG header: Platform unit [none]
--rg-pi <string>
RG header: Median insert size [none]
--rg-pg <string>
RG header: Programs [none]
--rg-cn <string>
RG header: sequencing center [none]
--rg-fo <string>
RG header: Flow order [none]
--rg-ks <string>
RG header: Key sequence [none]
一般参数:
-t <int>, --threads <int>
Number of threads [1]
-x <pacbio, ont>, --presets <pacbio, ont>
Parameter presets for different sequencing technologies [pacbio]
-i <0-1>, --min-identity <0-1>
Alignments with an identity lower than this threshold will be discarded [0.65]
-R <int/float>, --min-residues <int/float>
Alignments containing less than <int> or (<float> * read length) residues will be discarded [0.25]
--no-smallinv
Don't detect small inversions [false]
--no-lowqualitysplit
Split alignments with poor quality [false]
--verbose
Debug output [false]
--no-progress
Don't print progress info while mapping [false]
高级参数:
--match <float>
Match score [2]
--mismatch <float>
Mismatch score [-5]
--gap-open <float>
Gap open score [-5]
--gap-extend-max <float>
Gap open extend max [-5]
--gap-extend-min <float>
Gap open extend min [-1]
--gap-decay <float>
Gap extend decay [0.15]
-k <10-15>, --kmer-length <10-15>
K-mer length in bases [13]
--kmer-skip <int>
Number of k-mers to skip when building the lookup table from the reference [2]
--bin-size <int>
Sets the size of the grid used during candidate search [4]
--max-segments <int>
Max number of segments allowed for a read per kb [1]
--subread-length <int>
Length of fragments reads are split into [256]
--subread-corridor <int>
Length of corridor sub-reads are aligned with [40]
Sniffles
安装
推荐使用conda进行安装:
conda install sniffles
使用
sniffles -m mapped.sort.bam -v output.vcf
mapped.sort.bam可以来自ngmlr或bwa,如果是来自bwa,要使用-M参数标记出主要和次要比对。
网友评论