八. SNP检测和进化分析(Snippy)

作者: 小飞虎 | 来源:发表于2021-08-17 16:30 被阅读0次

    一. 简介

    Snippy是一款用于SNP检测的软件,可以通过分析得到核心SNP,进行比对构建进化树。

    Snippy finds SNPs between a haploid reference genome and your NGS sequence reads. It will find both substitutions (snps) and insertions/deletions (indels). It will use as many CPUs as you can give it on a single computer (tested to 64 cores). It is designed with speed in mind, and produces a consistent set of output files in a single folder. It can then take a set of Snippy results using the same reference and generate a core SNP alignment (and ultimately a phylogenomic tree).

    二. 安装

    可以利用conda进行安装:

    conda install -c bioconda snippy
    

    也可以直接从Github安装最新版本(conda安装试了几次都是老版本的,找不到snippy-multi):

    cd $ HOME
    git clone https://github.com/tseemann/snippy.git
    $HOME/snippy/bin/snippy --help
    

    三. 运行

    snippy运行常用参数包括:输出文件(--outdir),参考基因组文件(--ref ),输入文件可以是单末端(--se)或双末端(--R1,--R2)fastq文件,也可以是fasta文件(--ctgs)或bam文件(--bam),CPU数目(--cpus 默认8个)

    snippy [options] --outdir <dir> --ref <ref> --R1 <R1.fq.gz> --R2 <R2.fq.gz> --cpus 10
    snippy [options] --outdir <dir> --ref <ref> --se <R.fq.gz> --cpus 10
    snippy [options] --outdir <dir> --ref <ref> --ctgs <contigs.fa> --cpus 10
    snippy [options] --outdir <dir> --ref <ref> --bam <reads.bam> --cpus 10

    具体详细参数如下:

    RESOURCES
      --cpus N         Maximum number of CPU cores to use (default '8')
      --ram N          Try and keep RAM under this many GB (default '8')
      --tmpdir F       Fast temporary storage eg. local SSD (default '/tmp')
    INPUT
      --reference F    Reference genome. Supports FASTA, GenBank, EMBL (not GFF) (default '')
      --R1 F           Reads, paired-end R1 (left) (default '')
      --R2 F           Reads, paired-end R2 (right) (default '')
      --se F           Single-end reads (default '')
      --ctgs F         Don't have reads use these contigs (default '')
      --peil F         Reads, paired-end R1/R2 interleaved (default '')
      --bam F          Use this BAM file instead of aligning reads (default '')
      --targets F      Only call SNPs from this BED file (default '')
      --subsample n.n  Subsample FASTQ to this proportion (default '1')
    OUTPUT
      --outdir F       Output folder (default '')
      --prefix F       Prefix for output files (default 'snps')
      --report         Produce report with visual alignment per variant (default OFF)
      --cleanup        Remove most files not needed for snippy-core (inc. BAMs!) (default OFF)
      --rgid F         Use this @RG ID: in the BAM header (default '')
      --unmapped       Keep unmapped reads in BAM and write FASTQ (default OFF)
    PARAMETERS
      --mapqual N      Minimum read mapping quality to consider (default '60')
      --basequal N     Minimum base quality to consider (default '13')
      --mincov N       Minimum site depth to for calling alleles (default '10')
      --minfrac n.n    Minumum proportion for variant evidence (0=AUTO) (default '0')
      --minqual n.n    Minumum QUALITY in VCF column 6 (default '100')
      --maxsoft N      Maximum soft clipping to allow (default '10')
      --bwaopt F       Extra BWA MEM options, eg. -x pacbio (default '')
      --fbopt F        Extra Freebayes options, eg. --theta 1E-6 --read-snp-limit 2 (default '')
    

    也可以利用snippy-multi生成shell脚本文件批量执行,snippy-multi输入文件包括:

    snippy-multi abc.txt --reference ref.gbk --cpus 10 > run_snp.sh
    nohup ./run_snp.sh &
    
    1. 文件名和路径列表文件,格式如下:

    abc.txt:
    a /Absolute path/a.fq.gz
    b /Absolute path/b.fq.gz
    c /Absolute path/c.fq.gz
    ...

    1. 参考序列文件,可以是fasta文件,也可以是gbk文件。
    2. 需要分析的fq文件或fasta文件。

    eg: more run_snp.sh
    snippy --outdir a --ref ref.fas --se a.fq.gz --cpus 10
    snippy --outdir b --ref ref.fas --se b.fq.gz --cpus 10
    snippy --outdir c --ref ref.fas --se c.fq.gz --cpus 10
    ...
    snippy-core --ref 'a/ref.fa' a b c ...

    得到的run_snp.sh脚本是逐个执行,如果服务器性能好可以对脚本进行修改,在snippy命令行加上:nohup &,同时运行多个snippy命令;等所有snippy运行完后在单独执行snippy-core 命令。

    上述命令运行完之后,再执行以下命令构建进化树:

    nohup snippy-clean_full_aln core.full.aln > clean.full.aln &
    nohup  run_gubbins.py -p gubbins clean.full.aln & # 报错可以调整--filter_percentage 50
    nohup snp-sites -c gubbins.filtered_polymorphic_sites.fasta > clean.core.aln &
    nohup FastTree -gtr -nt clean.core.aln > clean.core.tree &
    

    其中,snippy,snippy-core,snippy-multi,snippy-clean_full_aln命令可以在~ /snippy/bin/目录下找到,snp-sites命令在~/snippy/binaries/linux/目录下,run_gubbins.py需要另外安装gubbins(conda install -c bioconda gubbins),如果找不到FastTree也需另外安装。

    相关文章

      网友评论

        本文标题:八. SNP检测和进化分析(Snippy)

        本文链接:https://www.haomeiwen.com/subject/vnilbltx.html