美文网首页基因组学工作生活
求 bam文件单个位点碱基比例

求 bam文件单个位点碱基比例

作者: caokai001 | 来源:发表于2019-07-01 21:02 被阅读0次
    IGV 上看碱基分布

    目的:给你一个bam 文件,如何查看一个位点的碱基情况,直接IGVok,如果需要看多个点,就需要使用软件了。(一般情况vcf DP4会告诉此位置ref base非ref base 碱基个数)

    在群里面大佬提示下,samtools mpileupbam-readcount 可以实现统计碱基分布。

    1.简单的安装

    conda install -c bioconda bam-readcount
    

    2.bam-readcount 帮助信息

    [kcao@h1-lgl genome_index_bowtie2]$ bam-readcount -h
    Usage: bam-readcount [OPTIONS] <bam_file> [region]
    Generate metrics for bam_file at single nucleotide positions.
    Example: bam-readcount -f ref.fa some.bam
    
    Available options:
      -h [ --help ]                         produce this message
      -v [ --version ]                      output the version number
      -q [ --min-mapping-quality ] arg (=0) minimum mapping quality of reads used 
                                            for counting.
      -b [ --min-base-quality ] arg (=0)    minimum base quality at a position to 
                                            use the read for counting.
      -d [ --max-count ] arg (=10000000)    max depth to avoid excessive memory 
                                            usage.
      -l [ --site-list ] arg                file containing a list of regions to 
                                            report readcounts within.
      -f [ --reference-fasta ] arg          reference sequence in the fasta format.
      -D [ --print-individual-mapq ] arg    report the mapping qualities as a comma
                                            separated list.
      -p [ --per-library ]                  report results by library.
      -w [ --max-warnings ] arg             maximum number of warnings of each type
                                            to emit. -1 gives an unlimited number.
      -i [ --insertion-centric ]            generate indel centric readcounts. 
                                            Reads containing insertions will not be
                                            included in per-base counts
    
    注:-w 限制warnings 输出的条数,不然好多warning!
          -p 碱基质量  1
          -q 比对质量 20
    

    3.测试效果

    github文档

    [kcao@h1-lgl dup]$ bam-readcount -w 1 -f /home/kcao/data/Genome/genome_index_bowtie2/Homo_sapiens.GRCh38.dna.chromosome.fa /home/kcao/test/HHHHHHHHHHHH/SRR6213130.uniq.bam  1:89085865-89085865|sed -n '$p' >tmpjjj
    Minimum mapping quality is set to 0
    WARNING: In read SRR6213130.15145147: Couldn't find single-end mapping quality. Check to see if the SM tag is in BAM.
    The previous warning has been emitted 1 times and will be disabled.
    
    输出结果:
    1   89085886    G   43  =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  G:43:60.00:37.51:2.79:33:10:0.43:0.01:15.51:33:0.63:101.00:0.56 T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00
    
    tips: 列名依次是,染色体,位置,参考碱基,深度,“=”(没理解等号含义),“A”,“C”,“G”,T“”,“N” 
    

    4 统计 A,G,C,T 碱基个数

    [kcao@h1-lgl dup]$ less tmpjjj |awk 'BEGIN{FS=OFS="\t"}{split($6,A,":");split($7,C,":");split($8,G,":");split($9,T,":");print $1,$2,$3,$4,"A:"A[2],"C:"C[2],"G:"G[2],"T:"T[2]}'
    
    
    1   89085886    G   43  A:0 C:0 G:43    T:0
    

    5.使用 参数-l region_list 查看多个位点碱基分布

    [kcao@h1-lgl dup]$ cat region_list 
    1   89085886    89085886
    1   89085975    89085975
    
    [kcao@h1-lgl dup]$ bam-readcount -w 1 -f /home/kcao/data/Genome/genome_index_bowtie2/Homo_sapiens.GRCh38.dna.chromosome.fa /home/kcao/test/HHHHHHHHHHHH/SRR6213130.uniq.bam  -l region_list|awk 'BEGIN{FS=OFS="\t"}{split($6,A,":");split($7,C,":");split($8,G,":");split($9,T,":");print $1,$2,$3,$4,"A:"A[2],"C:"C[2],"G:"G[2],"T:"T[2]}' >test.txt
    
    [kcao@h1-lgl dup]$ cat test.txt 
    1   89085886    G   43  A:0 C:0 G:43    T:0
    1   89085975    C   64  A:0 C:64    G:0 T:0
    
    • 有兴趣的可以试试samtools mpileup 结果~~
      samtools mpileup结果

    • 有时候,IGV reads 数目和bam-readcount 不对应,可能因为标记了dup 的缘故.

    相关文章

      网友评论

        本文标题:求 bam文件单个位点碱基比例

        本文链接:https://www.haomeiwen.com/subject/pzgncctx.html