美文网首页生物信息学WES测序分析
4 比对到参考基因组输出bam文件

4 比对到参考基因组输出bam文件

作者: Y大宽 | 来源:发表于2019-06-02 09:50 被阅读89次

    进到align目录
    对质量好的测序数据进行比对

    1. 一个个比对,生成BAM文件

    align目录

    sample=SRR7696207
    bwa mem -t 2 -R "@RG\tID:$sample\tSM:$sample\tLB:WGS\tPL:Illumina" ../hg38/bwa_index/gatk_hg38 ../clean/SRR7696207_1_val_1.fq.gz ../clean/SRR7696207_2_val_2.fq.gz |samtools sort -@ 2 -o SRR7696207.bam -
    

    不用-R参数也可以执行,但后面gatk的时候会报错

    [M::bwa_idx_load_from_disk] read 0 ALT contigs
    [M::process] read 143150 sequences (20000163 bp)...
    [M::process] read 142658 sequences (20000278 bp)...
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1, 61056, 1, 1)
    [M::mem_pestat] skip orientation FF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation FR...
    [M::mem_pestat] (25, 50, 75) percentile: (135, 165, 207)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 351)
    [M::mem_pestat] mean and std.dev: (174.05, 52.67)
    ......
    

    2或者循环批量比对

    #clean目录
    ls *1.fastq.gz > 1
    ls *2.fastq.gz > 2
    paste 1 2 > config
    vim config
    

    增加第一列文件名,记得不能空格,要Tab分隔
    如果样本量很多就用脚本,具体见大样本分析那篇
    align目录下

    INDEX=../hg38/bwa_index/gatk_hg38
    cat ../clean/config|while read id
    do 
    arr=($id)
    sample=${arr[0]}
    fq1=${arr[1]}
    fq2=${arr[2]}
    bwa mem -t 5 -R "@RG\tID:$sample\tSM:$sample\tLB:WGS\tPL:Illumina" $INDEX ../clean/$fq1 ../clean/$fq2 |samtools sort  -@ 2 -o $sample.bam -
    done &
    
    [M::bwa_idx_load_from_disk] read 0 ALT contigs
    [M::process] read 142876 sequences (20000122 bp)...
    [M::process] read 142628 sequences (20000141 bp)...
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 61992, 1, 0)
    [M::mem_pestat] skip orientation FF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation FR...
    [M::mem_pestat] (25, 50, 75) percentile: (137, 174, 219)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 383)
    [M::mem_pestat] mean and std.dev: (181.21, 59.08)
    [M::mem_pestat] low and high boundaries for proper pairs: (1, 465)
    [M::mem_pestat] skip orientation RF as there are not enough pairs
    [M::mem_pestat] skip orientation RR as there are not enough pairs
    [M::mem_process_seqs] Processed 142876 reads in 24.094 CPU sec, 11.833 real sec
    

    3 查看bam文件

    $ samtools view -H SRR8517853.bam |grep -v "SQ"
    
    @HD     VN:1.6  SO:coordinate
    @RG     ID:SRR8517853/tSM:SRR8517853    LB:WGS  PL:Illumina
    @PG     ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem -t 2 -R @RG\tID:SRR8517853/tSM:SRR8517853\tLB:WGS\tPL:Illumina ../hg38/bwa_index/gatk_hg38 ../clean/SRR8517853_1_val_1.fq.gz ../clean/SRR8517853_2_val_2.fq.gz
    

    相关文章

      网友评论

        本文标题:4 比对到参考基因组输出bam文件

        本文链接:https://www.haomeiwen.com/subject/ztzbtctx.html