八. SNP检测和进化分析（Snippy）

作者: 小飞虎 | 来源:发表于2021-08-17 16:30 被阅读0次

一. 简介

Snippy是一款用于SNP检测的软件，可以通过分析得到核心SNP，进行比对构建进化树。

Snippy finds SNPs between a haploid reference genome and your NGS sequence reads. It will find both substitutions (snps) and insertions/deletions (indels). It will use as many CPUs as you can give it on a single computer (tested to 64 cores). It is designed with speed in mind, and produces a consistent set of output files in a single folder. It can then take a set of Snippy results using the same reference and generate a core SNP alignment (and ultimately a phylogenomic tree).

二. 安装

可以利用conda进行安装：

conda install -c bioconda snippy

也可以直接从Github安装最新版本（conda安装试了几次都是老版本的，找不到snippy-multi)：

cd $ HOME
git clone https://github.com/tseemann/snippy.git
$HOME/snippy/bin/snippy --help

三. 运行

snippy运行常用参数包括：输出文件（--outdir），参考基因组文件（--ref ），输入文件可以是单末端（--se）或双末端（--R1，--R2）fastq文件，也可以是fasta文件（--ctgs）或bam文件（--bam），CPU数目（--cpus 默认8个）

snippy [options] --outdir <dir> --ref <ref> --R1 <R1.fq.gz> --R2 <R2.fq.gz> --cpus 10
snippy [options] --outdir <dir> --ref <ref> --se <R.fq.gz> --cpus 10
snippy [options] --outdir <dir> --ref <ref> --ctgs <contigs.fa> --cpus 10
snippy [options] --outdir <dir> --ref <ref> --bam <reads.bam> --cpus 10

具体详细参数如下：

RESOURCES
  --cpus N         Maximum number of CPU cores to use (default '8')
  --ram N          Try and keep RAM under this many GB (default '8')
  --tmpdir F       Fast temporary storage eg. local SSD (default '/tmp')
INPUT
  --reference F    Reference genome. Supports FASTA, GenBank, EMBL (not GFF) (default '')
  --R1 F           Reads, paired-end R1 (left) (default '')
  --R2 F           Reads, paired-end R2 (right) (default '')
  --se F           Single-end reads (default '')
  --ctgs F         Don't have reads use these contigs (default '')
  --peil F         Reads, paired-end R1/R2 interleaved (default '')
  --bam F          Use this BAM file instead of aligning reads (default '')
  --targets F      Only call SNPs from this BED file (default '')
  --subsample n.n  Subsample FASTQ to this proportion (default '1')
OUTPUT
  --outdir F       Output folder (default '')
  --prefix F       Prefix for output files (default 'snps')
  --report         Produce report with visual alignment per variant (default OFF)
  --cleanup        Remove most files not needed for snippy-core (inc. BAMs!) (default OFF)
  --rgid F         Use this @RG ID: in the BAM header (default '')
  --unmapped       Keep unmapped reads in BAM and write FASTQ (default OFF)
PARAMETERS
  --mapqual N      Minimum read mapping quality to consider (default '60')
  --basequal N     Minimum base quality to consider (default '13')
  --mincov N       Minimum site depth to for calling alleles (default '10')
  --minfrac n.n    Minumum proportion for variant evidence (0=AUTO) (default '0')
  --minqual n.n    Minumum QUALITY in VCF column 6 (default '100')
  --maxsoft N      Maximum soft clipping to allow (default '10')
  --bwaopt F       Extra BWA MEM options, eg. -x pacbio (default '')
  --fbopt F        Extra Freebayes options, eg. --theta 1E-6 --read-snp-limit 2 (default '')

也可以利用snippy-multi生成shell脚本文件批量执行，snippy-multi输入文件包括：

snippy-multi abc.txt --reference ref.gbk --cpus 10 > run_snp.sh
nohup ./run_snp.sh &

文件名和路径列表文件，格式如下：

abc.txt：
a /Absolute path/a.fq.gz
b /Absolute path/b.fq.gz
c /Absolute path/c.fq.gz
...

参考序列文件，可以是fasta文件，也可以是gbk文件。
需要分析的fq文件或fasta文件。

eg: more run_snp.sh
snippy --outdir a --ref ref.fas --se a.fq.gz --cpus 10
snippy --outdir b --ref ref.fas --se b.fq.gz --cpus 10
snippy --outdir c --ref ref.fas --se c.fq.gz --cpus 10
...
snippy-core --ref 'a/ref.fa' a b c ...

得到的run_snp.sh脚本是逐个执行，如果服务器性能好可以对脚本进行修改，在snippy命令行加上：nohup &，同时运行多个snippy命令；等所有snippy运行完后在单独执行snippy-core 命令。

上述命令运行完之后，再执行以下命令构建进化树：

nohup snippy-clean_full_aln core.full.aln > clean.full.aln &
nohup  run_gubbins.py -p gubbins clean.full.aln & # 报错可以调整--filter_percentage 50
nohup snp-sites -c gubbins.filtered_polymorphic_sites.fasta > clean.core.aln &
nohup FastTree -gtr -nt clean.core.aln > clean.core.tree &

其中，snippy，snippy-core，snippy-multi，snippy-clean_full_aln命令可以在~ /snippy/bin/目录下找到，snp-sites命令在~/snippy/binaries/linux/目录下，run_gubbins.py需要另外安装gubbins（conda install -c bioconda gubbins），如果找不到FastTree也需另外安装。

网友评论

本文标题：八. SNP检测和进化分析（Snippy）

本文链接：https://www.haomeiwen.com/subject/vnilbltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

八. SNP检测和进化分析（Snippy）

一. 简介

二. 安装

三. 运行

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

叶绿体基因组分析相关

生物信息学

遗传学