美文网首页
2020-07-14 靶向捕获测序数据分析记录2

2020-07-14 靶向捕获测序数据分析记录2

作者: 程凉皮儿 | 来源:发表于2020-07-14 09:08 被阅读0次

    bam文件转换参考基因组版本

    之前的质控结果提示有点问题,查找原因后发现,之前的bam文件是根据hg19参考基因组完成的,所以需要先进行版本转换:
    参考资料http://crossmap.sourceforge.net/#convert-bam-cram-sam-format-files
    CrossMap.py工具可以进行各种文件之间的版本转换

    image.png

    Chain file

    1. Download Ensembl chain files
    1. Download UCSC chain files

    CrossMap依赖pyBigWig

    安装指令:

    conda create -n py3 python=3.6
    conda activate py3
    conda install -c bioconda pyBigWig
    pip3 install CrossMap
    

    使用参数:

    $ python CrossMap.py bam
    
    Usage: CrossMap.py bam chain_file input_file output_file [options]
    Note: If output_file == STDOUT or -, CrossMap will write BAM file to the screen
    
    Options:
      -m INSERT_SIZE, --mean=INSERT_SIZE
                            Average insert size of pair-end sequencing (bp).
                            [default=200.0]
      -s INSERT_SIZE_STDEV, --stdev=INSERT_SIZE_STDEV
                            Stanadard deviation of insert size. [default=30.0]
      -t INSERT_SIZE_FOLD, --times=INSERT_SIZE_FOLD
                            A mapped pair is considered as "proper pair" if both
                            ends mapped to different strand and the distance
                            between them is less then '-t' * stdev from the mean.
                            [default=3.0]
      -a, --append-tags     Add tag to each alignment.
    

    转换结果部分:

    (wes) root@1100150:~/project# cat config | while read id
    > do
    > bam=~/CHD_pooling_seq/${id}.dedup.bam
    > python /root/miniconda3/envs/py3/bin/CrossMap.py bam ~/biosoft/liftover/hg19ToHg38.over.chain.gz ${bam} ~/project/0.bwa/${id}.hg38.bam
    > done
    Insert size = 200.000000
    Insert size stdev = 30.000000
    Number of stdev from the mean = 3.000000
    Add tags to each alignment = False
    @ 2020-07-13 12:19:07: Read the chain file:  /root/biosoft/liftover/hg19ToHg38.over.chain.gz
    [W::hts_idx_load3] The index file is older than the data file: /root/CHD_pooling_seq/C1.dedup.bam.bai
    @ 2020-07-13 12:19:07: Liftover BAM file: /root/CHD_pooling_seq/C1.dedup.bam ==> /root/project/0.bwa/C1.hg38.bam.bam
    

    由上可知,生成的bam会自带.bam后缀,所以前面的命名可以不加。输出文件名直接写成~/project/0.bwa/${id}.hg38即可。
    终止命令,重新转换:

    (wes) root@1100150:~/project# cat config | while read id
    > do
    > bam=~/CHD_pooling_seq/${id}.dedup.bam
    > python /root/miniconda3/envs/py3/bin/CrossMap.py bam ~/biosoft/liftover/hg19ToHg38.over.chain.gz ${bam} ~/project/0.bwa/${id}.hg38
    > done
    Insert size = 200.000000
    Insert size stdev = 30.000000
    Number of stdev from the mean = 3.000000
    Add tags to each alignment = False
    @ 2020-07-14 00:45:25: Read the chain file:  /root/biosoft/liftover/hg19ToHg38.over.chain.gz
    [W::hts_idx_load3] The index file is older than the data file: /root/CHD_pooling_seq/C1.dedup.bam.bai
    @ 2020-07-14 00:45:26: Liftover BAM file: /root/CHD_pooling_seq/C1.dedup.bam ==> /root/project/0.bwa/C1.hg38.bam
    @ 2020-07-14 01:07:19: Done!
    @ 2020-07-14 01:07:19: Sort "/root/project/0.bwa/C1.hg38.bam" and save as "/root/project/0.bwa/C1.hg38.sorted.bam"
    @ 2020-07-14 01:14:00: Index "/root/project/0.bwa/C1.hg38.sorted.bam" ...
    Total alignments:13282037
         QC failed: 0
         Paired-end reads:
        R1 unique, R2 unique (UU): 12842196
        R1 unique, R2 unmapp (UN): 69655
        R1 unique, R2 multiple (UM): 0
        R1 multiple, R2 multiple (MM): 0
        R1 multiple, R2 unique (MU): 25411
        R1 multiple, R2 unmapped (MN): 2648
        R1 unmap, R2 unmap (NN): 275629
        R1 unmap, R2 unique (NU): 66498
        R1 unmap, R2 multiple (NM): 0
    

    从结果可以看到,转换版本之后,还会自己对bam文件进行sort。

    相关文章

      网友评论

          本文标题:2020-07-14 靶向捕获测序数据分析记录2

          本文链接:https://www.haomeiwen.com/subject/cypzcktx.html