mpileup命令参数和结果详解

作者: Y大宽 | 来源:发表于2019-06-02 09:52 被阅读65次

2021-08-23
mpileup命令参数和结果详解
Shell常见命令
parted磁盘分区命令详解
cutadapt 使用详解
Sqoop一些常用命令及参数
iperf3工具使用
Python命令行:getopt模块详解
Samtools 命令详解
ImageMagick之图片裁剪详解

mpileup是samtools的一个命令，用来生存bcf文件，然后再用bcftools进行SNP和Indel的分析。另外，bcftools是samtools的附带软件。

1用法

samtools mpileup [options] in1.bam [in2.bam [...]]

Input options:
  -6, --illumina1.3+      quality is in the Illumina-1.3+ encoding
  -A, --count-orphans     do not discard anomalous read pairs
  -b, --bam-list FILE     list of input BAM filenames, one per line
  -B, --no-BAQ            disable BAQ (per-Base Alignment Quality)
  -C, --adjust-MQ INT     adjust mapping quality; recommended:50, disable:0 [0]
  -d, --max-depth INT     max per-file depth; avoids excessive memory usage [8000]
  -E, --redo-BAQ          recalculate BAQ on the fly, ignore existing BQs
  -f, --fasta-ref FILE    faidx indexed reference sequence file
  -G, --exclude-RG FILE   exclude read groups listed in FILE
  -l, --positions FILE    skip unlisted positions (chr pos) or regions (BED)
  -q, --min-MQ INT        skip alignments with mapQ smaller than INT [0]
  -Q, --min-BQ INT        skip bases with baseQ/BAQ smaller than INT [13]
  -r, --region REG        region in which pileup is generated
  -R, --ignore-RG         ignore RG tags (one BAM = one sample)
  --rf, --incl-flags STR|INT  required flags: skip reads with mask bits unset []
  --ff, --excl-flags STR|INT  filter flags: skip reads with mask bits set
                                            [UNMAP,SECONDARY,QCFAIL,DUP]
  -x, --ignore-overlaps   disable read-pair overlap detection

Output options:
  -o, --output FILE       write output to FILE [standard output]
  -O, --output-BP         output base positions on reads
  -s, --output-MQ         output mapping quality
      --output-QNAME      output read names
  -a                      output all positions (including zero depth)
  -a -a (or -aa)          output absolutely all positions, including unused ref. sequences

最常用的是

最常用的参数有两个：
-f用samtools faidx对参考序列建index.fai文件，其他软件也可以
-g输出到bcf格，否则生成文本格式文件。用法和最简单的例子如下
u输出不压缩的bcf文件

$ samtools mpileup -f genome.fasta abc.bam > abc.txt
$ samtools mpileup -gSDf genome.fasta abc.bam > abc.bcf
$ samtools mpileup -guSDf genome.fasta abc.bam |bcftools view -cvNg - > abc.vcf

2结果解释

2.1结果共6列

[mpileup] 1 samples in 1 input files
chr1    10105   N       8       AAAAcAAA        kuuu>6uK
chr1    10106   N       10      CCCCcCCCC^$c    uuuukAuuAK
chr1    10107   N       10      CCCCcCCCCc      upukaFfuKF
chr1    10108   N       9       CCCCcCCCC       ukup7K\uK
chr1    10109   N       10      AA$AAaAAAA$a    kpuuBKppFA
chr1    10110   N       7       AA$A$aAAA       kupkFkp
chr1    10111   N       6       CcCCCc  QuApuK
chr1    10112   N       6       CcCCCc  \uKppK
chr1    10113   N       6       C$cCCCc \uKkpK
chr1    10114   N       5       tTTTt   k7fpF

分别是

参考序列名（染色体）

位置

参考序列的碱基

比对上的read数目

比对的情况

比对上的碱基的质量

2.2其中第五列相对复杂，具体解释如下：

chr1    10056   N       7       AAAA*AA kfuufKK
chr1    2003928 N       5       CCC+6GGGCCGC+6GGGCCGC   uupu<
chr1    10106   N       10      CCCCcCCCC^$c    uuuukAuuAK
chr1    737247  N       3       Cc^5c   KKK
chr1    1197931 N       3       T-6NNNNNNT-6NNNNNNt-6nnnnnn     uuK

1 *表示模糊碱基
2 大写表示在正链不匹配
3 小写表示在负链不匹配
4^表示匹配的碱基是一个reads的开始，^后紧跟的ascii码减去33代表比对质量，修饰的是后面的碱基，后面紧跟的碱基代表该read的第一个碱基
5 $代表一个read的结束，该符号修饰前面的碱基
6 正则表达式式+[0-9]+[ACGTNacgtn]+代表在该位点后插入的碱基。举例中chr1的2003928A后面有个+6GGGCCG，很可能是indel
7 正则表达式’-[0-9]+[ACGTNacgtn]+’代表在该位点后缺失的碱基；