使用命令 samtools stat;的信息
#Usage: samtools stats [OPTIONS] file.bam
# samtools stats [OPTIONS] file.bam chr:from-to
详情可以参考这里 samtools-stats
输出实际上有两部分,前面是@SN 开头的,这里是我们看信息,还可以根据基因组总长度, 计算比对的碱基 在基因组上的平均深度
depth=bases mapped/genome len
The SN section contains a series of counts, percentages, and averages, in a similar style to samtools flagstat, but more comprehensive.
raw total sequences - total number of reads in a file. Same number reported by samtools view -c.
filtered sequences - number of discarded reads when using -f or -F option.
sequences - number of processed reads.
is sorted - flag indicating whether the file is coordinate sorted (1) or not (0).
1st fragments - number of first fragment reads (flags 0x01 not set; or flags 0x01 and 0x40 set, 0x80 not set).
last fragments - number of last fragment reads (flags 0x01 and 0x80 set, 0x40 not set).
reads mapped - number of reads, paired or single, that are mapped (flag 0x4 or 0x8 not set).
reads mapped and paired - number of mapped paired reads (flag 0x1 is set and flags 0x4 and 0x8 are not set).
reads unmapped - number of unmapped reads (flag 0x4 is set).
reads properly paired - number of mapped paired reads with flag 0x2 set.
paired - number of paired reads, mapped or unmapped, that are neither secondary nor supplementary (flag 0x1 is set and flags 0x100 (256) and 0x800 (2048) are not set).
reads duplicated - number of duplicate reads (flag 0x400 (1024) is set).
reads MQ0 - number of mapped reads with mapping quality 0.
reads QC failed - number of reads that failed the quality checks (flag 0x200 (512) is set).
non-primary alignments - number of secondary reads (flag 0x100 (256) set).
total length - number of processed bases from reads that are neither secondary nor supplementary (flags 0x100 (256) and 0x800 (2048) are not set).
total first fragment length - number of processed bases that belong to first fragments.
total last fragment length - number of processed bases that belong to last fragments.
bases mapped - number of processed bases that belong to reads mapped.
bases mapped (cigar) - number of mapped bases filtered by the CIGAR string corresponding to the read they belong to. Only alignment matches(M), inserts(I), sequence matches(=) and sequence mismatches(X) are counted.
bases trimmed - number of bases trimmed by bwa, that belong to non secondary and non supplementary reads. Enabled by -q option.
bases duplicated - number of bases that belong to reads duplicated.
mismatches - number of mismatched bases, as reported by the NM tag associated wit a read, if present.
error rate - ratio between mismatches and bases mapped (cigar).
average length - ratio between total length and sequences.
average first fragment length - ratio between total first fragment length and 1st fragments.
average last fragment length - ratio between total last fragment length and last fragments.
maximum length - length of the longest read (includes hard-clipped bases).
maximum first fragment length - length of the longest first fragment read (includes hard-clipped bases).
maximum last fragment length - length of the longest last fragment read (includes hard-clipped bases).
average quality - ratio between the sum of base qualities and total length.
insert size average - the average absolute template length for paired and mapped reads.
insert size standard deviation - standard deviation for the average template length distribution.
inward oriented pairs - number of paired reads with flag 0x40 (64) set and flag 0x10 (16) not set or with flag 0x80 (128) set and flag 0x10 (16) set.
outward oriented pairs - number of paired reads with flag 0x40 (64) set and flag 0x10 (16) set or with flag 0x80 (128) set and flag 0x10 (16) not set.
pairs with other orientation - number of paired reads that don't fall in any of the above two categories.
pairs on different chromosomes - number of pairs where one read is on one chromosome and the pair read is on a different chromosome.
percentage of properly paired reads - percentage of reads properly paired out of sequences.
bases inside the target - number of bases inside the target region(s) (when a target file is specified with -t option).
percentage of target genome with coverage > VAL - percentage of target bases with a coverage larger than VAL. By default, VAL is 0, but a custom value can be supplied by the user with -g option.
网友评论