GWAS走起~(5 bcftools)

作者: roguish读书人 | 来源:发表于2019-12-02 21:00 被阅读0次

    这几天没写简书,因为送去测序的样品又不合格的,比较烦人,今天刚补了样。

    O,忘了.还要召唤油盐酱醋(call various)

    这个不用学习新软件samtools就可以了,samtools中的mpielup。该命令用于生成bcf文件,再使用bcftools进行SNPIndel的分析。Bcftools不是新软件,bcftoolssamtool中附带的软件,在samtools的安装文件夹中可以找到。其实关于软件及其选项的怎么用的问题,可以直接在shell中打上命令,然后按椅子啊enter键就可以很清楚的看到这些命令及其参数的用法和说明。

    好的,开盘!

    [hai@localhost data]$ samtools mpileup

     

    Usage: samtools mpileup [options] in1.bam [in2.bam [...]]

     

    Input options:

     

           -6           assume the quality is in theIllumina-1.3+ encoding

           -A           count anomalous read pairs

           -B           disable BAQ computation

           -b FILE      list of input BAM filenames, one per line[null]

           -C INT       parameter for adjusting mapQ; 0 todisable [0]

           -d INT       max per-BAM depth to avoid excessivememory usage [250]

           -E           recalculate extended BAQ on the flythus ignoring existing BQs

           -f FILE      faidx indexed reference sequence file[null]

           -G FILE      exclude read groups listed in FILE [null]

           -l FILE      list of positions (chr pos) or regions(BED) [null]

           -M INT       cap mapping quality at INT [60]

           -r STR       region in which pileup is generated[null]

           -R           ignore RG tags

           -q INT       skip alignments with mapQ smaller thanINT [0]

           -Q INT       skip bases with baseQ/BAQ smaller thanINT [13]

           --rf INT     required flags: skip reads with mask bitsunset []

           --ff INT     filter flags: skip reads with mask bitsset []

     

    Output options:

     

           -D           output per-sample DP in BCF (require-g/-u)

           -g           generate BCF output (genotypelikelihoods)

           -O           output base positions on reads(disabled by -g/-u)

           -s           output mapping quality (disabled by -g/-u)

           -S           output per-sample strand biasP-value in BCF (require -g/-u)

           -u           generate uncompress BCF output

     

    SNP/INDEL genotype likelihoods options (effective with `-g' or `-u'):

     

           -e INT       Phred-scaled gap extension seq errorprobability [20]

           -F FLOAT     minimum fraction of gapped reads forcandidates [0.002]

           -h INT       coefficient for homopolymer errors [100]

           -I           do not perform indel calling

           -L INT       max per-sample depth for INDEL calling[250]

           -m INT       minimum gapped reads for indelcandidates [1]

           -o INT       Phred-scaled gap open sequencing errorprobability [40]

           -p           apply -m and -F per-sample toincrease sensitivity

           -P STR       comma separated list of platforms for indels[all]

     

    Notes: Assuming diploid individuals.

    这就是mpileup的参数及其用法,我怕解释不清楚,大家还是自己看着折磨吧!

    再跟大家介绍个非常有用的Linux命令“|”这命令老厉害了,可以直接将上一步产生的内容放入下一个命令的输入中。

    samtools mpileup -guSDf ref1.fa out.rmdup.bam | bcftools view -cvNg - >abc.vcf

    -g 输出到bcf格式。

    -f :来输入有索引文件的fasta参考序列。

    [if !supportLists]·        [endif]-S: 在BCF中输出每个样本的偏置p值

    -D :BCF中每个样本的DP输出(需要-g/-u)

    -u: 生成解压BCF输出(这个可以缩短运行时间)

    同样的做法,了解一下bcftools view

    Usage: bcftools view [options] [reg]

    Input/output options:

           -A        keep all possible alternate alleles atvariant sites

           -b        output BCF instead of VCF

           -D FILE   sequence dictionary for VCF->BCFconversion [null]

           -F        PL generated by r921 or before (whichgenerate old ordering)

           -G        suppress all individual genotypeinformation

           -l FILE   list of sites (chr pos) or regions (BED) tooutput [all sites]

           -L        calculate LD for adjacent sites

           -N        skip sites where REF is not A/C/G/T

           -Q        output the QCALL likelihood format

           -s FILE   list of samples to use [all samples]

           -S        input is VCF

           -u        uncompressed BCF output (force -b)

    Consensus/variant calling options:

           -c        SNP calling (force -e)

           -d FLOAT  skip loci where less than FLOAT fraction ofsamples covered [0]

           -e        likelihood based analyses

           -g        call genotypes at variant sites (force-c)

           -i FLOAT  indel-to-substitution ratio [-1]

           -I        skip indels

           -m FLOAT  alternative model for multiallelic andrare-variant calling, include if P(chi^2)>=FLOAT

           -p FLOAT  variant if P(ref|D)

           -P STR    type of prior: full, cond2, flat [full]

           -t FLOAT scaled substitution mutation rate [0.001]

           -T STR    constrained calling; STR can be: pair,trioauto, trioxd and trioxs (see manual) [null]

           -v        output potential variant sites only(force -c)

    Contrast calling and association test options:

           -1 INT    number of group-1 samples [0]

           -C FLOAT  posterior constrast for LRT

           -U INT    number of permutations for associationtesting (effective with -1) [0]

           -X FLOAT  only perform permutations forP(chi^2)

    -c召唤snp

    -v:输出变异位点;

    [if !supportLists]·        [endif]-N跳过REF不是A/C/G/T的站点

    [if !supportLists]·        [endif]-g召唤出不同位点的基因型(可以看成是基因分型)

     

    最后,咱们得到一个vcf文件。这个就是我们要的菜了,好不好吃不知道,好不好看不知道,反正菜出来了,下面改一下它的吃相,改一下他的看相。

    vcftools --vcf abc.vcf --plink --out SNP

    先将vcf 格式转换为plink需要的格式,记住这个操作哦!操作完毕会有mappedlog三个文件格式产生,其实后期用mapped这两个就可以了。

    下面就是plink亮相的时候了。

    神龙已被召唤,那就来一下真正的GWAS  吧!!哇卡卡!

    相关文章

      网友评论

        本文标题:GWAS走起~(5 bcftools)

        本文链接:https://www.haomeiwen.com/subject/vstqgctx.html