GATK插件检测拷贝数变异

作者: 佳期如梦你也是 | 来源:发表于2020-01-16 11:45 被阅读0次
    作者按

    本文记述了博主测试软件GATK的插件来检测CNV的所有过程,mind如下,部分尚待补充。

    概述

    GATK是一款认可度较高的点突变变异检测的软件,help的时候偶然发现有插件可以用来检测CNV,所以尝试了一下,比较小众,不推荐。

    官方文献为

    BETA,未发现publication,但有构建normal数据库,文献为https://www.nature.com/articles/nature15393

    官方manual网址为

    https://gatk.broadinstitute.org/hc/en-us/articles/360035531092?id=11682

    原理

    (待有时间补充)
    基本上还是统计深度,假设检验,取显著区间,再合并相邻step。

    安装

    安装了gatk即可,和Mutect2使用方法类似。

    运行脚本

    #1.1prepare for region file
    gatk PreprocessIntervals -L panel_regions.bed -R ucsc.hg19.fasta -O output/test.list --bin-length 267 --interval-merging-rule OVERLAPPING_ONLY
    ####annoGC
    gatk AnnotateIntervals -L output/test.list -O output/test.anno.list -R ucsc.hg19.fasta --interval-merging-rule  OVERLAPPING_ONLY --sequence-dictionary ~/GATK/hg19Ref/ucsc.hg19.dict
    #1.2
    gatk CollectReadCounts -I  input/test.HQ.bam -L output/test.list --interval-merging-rule OVERLAPPING_ONLY -O output/test.counts.hdf5 ##在使用anno.list的时候回出现报错Query interval "@HD  VN:1.5" is not valid for this input,可以考虑删掉这些咦@开头的无意义行
    #2构建pon
    #2.1先统计pon的bam深度
    for i in `cat samplelist`
    do
    gatk CollectReadCounts -I input/18080706-1/${i}.HQ.bam -L liqian.list panel.bed --interval-merging-rule OVERLAPPING_ONLY -O PON/${i}.hdf5
    done
    #2.2生成pon文件
    gatk CreateReadCountPanelOfNormals -I PON/pon1.hdf5  -I PON/pon2.hdf5 -I PON/pon3.hdf5 -I PON/pon4.hdf5 -I PON/pon5.hdf5 -I PON/pon6.hdf5 -I PON/pon7.hdf5 -I PON/pon8.hdf5 -I PON/pon9.hdf5 -I PON/pon10.hdf5 -I PON/pon11.hdf5 --minimum-interval-median-percentile 5.0 -O PON/11_normal.hdf5
    #3去噪
    gatk  DenoiseReadCounts -I output/test.counts.hdf5 --count-panel-of-normals PON/11_normal.hdf5 --standardized-copy-ratios  output/test.standardCR.tsv --denoised-copy-ratios output/test.denoisedCR.tsv
    #4标准化copyration
    gatk PlotDenoisedCopyRatios --standardized-copy-ratios output/test.standardCR.tsv --denoised-copy-ratios output/test.denoisedCR.tsv --sequence-dictionary ~/GATK/hg19Ref/ucsc.hg19.dict --minimum-contig-length 46709983 --output output/ --output-prefix test
    #5计算单倍体的拷贝数
    gatk CollectAllelicCounts -L output/test.list -I input/test.HQ.bam -R ~/GATK/hg19Ref/ucsc.hg19.fasta -O output/test_T_clean.allelicCounts.tsv
    #6分割
    gatk ModelSegments --denoised-copy-ratios output/test..denoisedCR.tsv --allelic-counts output/test_T_clean.allelicCounts.tsv --output output --output-prefix test
    #7对片段求ratio
    gatk CallCopyRatioSegments --input output/test.cr.seg --output output/test.cr.call.seg --calling-copy-ratio-z-score-threshold 2.0 --neutral-segment-copy-ratio-upper-bound 1.1 --neutral-segment-copy-ratio-lower-bound 0.9 
    #8.1最终结果求出显著片段
    gatk PlotModeledSegments --denoised-copy-ratios output/test.denoisedCR.tsv --allelic-counts output/test.hets.tsv --segments output/test.modelFinal.seg --sequence-dictionary ~/hg19Ref/ucsc.hg19.dict --minimum-contig-length 46709983 --output output/ --output-prefix test
    
    

    结果解析

    (待补充)

    可视化结果

    image.png

    参考文献

    NOTE

    相关文章

      网友评论

        本文标题:GATK插件检测拷贝数变异

        本文链接:https://www.haomeiwen.com/subject/nwcfzctx.html