美文网首页
GATK 4.1.9.0 CreateSomaticPanelO

GATK 4.1.9.0 CreateSomaticPanelO

作者: PETJO | 来源:发表于2021-03-11 18:13 被阅读0次
使用GATK 4.1.9.0 分析单个个体(具有一个或多个肿瘤样本)体细胞变异Somatic short variant discovery (SNVs + Indels),有或无正常对照样本。

Identify somatic short variants (SNVs and Indels) in one or more tumor samples from a single individual, with or without a matched normal sample.

1. 按照官网推荐GATK4最佳实践流程分析肿瘤样本体细胞变异(How to) Call somatic mutations using GATK4 Mutect2
how to call somatic mutations using GATK4 Mutect2.png
2. GATK 4.1.9.0 分析体细胞变异主要分为两步:第一步,使用GATK4 Mutect2软件分析候选体细胞变异;第二步,对候选体细胞变异进行过滤。
GATK Mutect2 calling SNVs+Indels.png
官网流程代码
## step 1: Call candidate variants
# 配对样本
gatk Mutect2 \
    -R ref.fasta \
    -L intervals.interval_list \
    -I tumor.bam -tumor tumor\
    -I normal.bam -norma normal\
    -germline-resource af-only-gnomad.vcf \
    -pon panel_of_normals.vcf \
    --f1r2-tar-gz f1r2.tar.gz \
    -O unfiltered.vcf

# 仅有肿瘤样本
gatk Mutect2 \
    -R ref.fasta \
    -L intervals.interval_list \
    -I tumor.bam -tumor tumor\
    -germline-resource af-only-gnomad.vcf \
    -pon panel_of_normals.vcf \
    --f1r2-tar-gz f1r2.tar.gz \
    -O unfiltered.vcf

## step 2: Calculate Contamination
gatk GetPileupSummaries \
    -I tumor.bam \
    -V chr17_small_exac_common_3_grch38.vcf.gz \
    -L chr17_small_exac_common_3_grch38.vcf.gz \
    -O tumor.getpileupsummaries.table

gatk GetPileupSummaries \
    -I normal.bam \
    -V chr17_small_exac_common_3_grch38.vcf.gz \
    -L chr17_small_exac_common_3_grch38.vcf.gz \
    -O normal.getpileupsummaries.table
    
gatk CalculateContamination \
    -I tumor.getpileupsummaries.table \
    -matched normal.getpileupsummaries.table
    -O calculatecontamination.table

## step 3: Learn Orientation Bias Artifacts
gatk LearnReadOrientationModel \
    -I f1r2.tar.gz \
    -O read-orientation-model.tar.gz

## step 4: Filter Variants
gatk FilterMutectCalls \
    -V unfiltered.vcf \
    --contamination-table calculatecontamination.table \
    --ob-priors read-orientation-model.tar.gz \
    -O filtered.vcf
3. 在使用GATK4 Mutect2软件分析候选体细胞变异时,最好使用Panel of Normals (PON)文件,以提高变异分析准确度。

GATK开发团队认为,在生成PON文件时,正常对照样本越多越好(至少40个),但是其实一些小样本(4-10个)也是可以的,有总比没有好。

Panel of Normals.png
4. 官网三步生成Panel of Normals(PON)教程
# Step 1: Run Mutect2 in tumor-only mode for each normal sample:
gatk Mutect2 \
    -R reference.fasta \
    -I normal1.bam \
    --max-mnp-distance 0 \
    -O normal1.vcf.gz

# Step 2: Create a GenomicsDB from the normal Mutect2 calls:
gatk GenomicsDBImport \
    -R reference.fasta \
    -L intervals.interval_list \
    --genomicsdb-workspace-path pon_db \
    -V normal1.vcf.gz \
    -V normal2.vcf.gz \
    -V normal3.vcf.gz
    
# Step 3: Combine the normal calls using CreateSomaticPanelOfNormals:
gatk CreateSomaticPanelOfNormals \
    -R reference.fasta \
    --germline-resource af-only-gnomad.vcf.gz \
    -V gendb://pon_db \
    -O pon.vcf.gz
5. 本次使用Agilent_v5 + UTR WES数据(A是肿瘤样本,B是正常对照)构建PON
agilent v5.png
6. 经过数据前处理得到bqsr.bam文件
BQSR1.bam.png B.bqsr.bam.png
7. step1_pon代码
#!/usr/bash

# Working directory
rawdata_dir=$HOME/wes/rawdata
cleandata_dir=$HOME/wes/cleandata
align_dir=$HOME/wes/align
variation_dir=$HOME/wes/variation
annotation_dir=$HOME/wes/annotation

# Software and Reference
INDEX=/opt/reference/gatk_hg19_index/gatk_hg19
GATK=/opt/software/gatk-4.1.9.0/gatk
REF=/opt/reference/gatk_bundle_hg19/ucsc.hg19.fasta
BED=/opt/reference/Exome_Agilent_V5/Exome-Agilent_V5chr_Clinical.bed
INTERVAL_LIST=/opt/reference/Exome_Agilent_V5/Agilent_V5chr_Clinical.interval_list
gnomad_b37_vcf=/opt/reference/gatk_bundle_hg19/Mutect2/af-only-gnomad.raw.sites.b37.vcf

#### Panel of Normals
# step1_pon
# -L ${BED} \
# --native-pair-hmm-threads 16 \
# --independent-mates true \

ls $align_dir/*B.bqsr.bam | while read id
do
    sample=$(basename ${id} .bqsr.bam)
    echo ${id} ${sample}
    
    echo "start step1_pon for ${sample}" `date`
    $GATK --java-options "-Xmx100G -Djava.io.tmpdir=./" Mutect2 \
    -R ${REF} \
    -I ${id} \
    --max-mnp-distance 0 \
    -O $variation_dir/pon/${sample}.vcf.gz \
    1>$variation_dir/pon/step1_pon.log 2>&1
    echo "end step1_pon for ${sample}" `date`
done
B.vcf.gz.jpg
8. step2_pon 代码
tmp=""
for i in $(ls $variation_dir/pon/*.vcf.gz)
do
    argv="$tmp -V $i"
    tmp=$argv
done
    echo ${argv}
    
    echo "start step2_pon" `date`
    $GATK --java-options "-Xmx100G -Djava.io.tmpdir=./" GenomicsDBImport \
    -R ${REF} \
    -L ${BED} \
    --reader-threads 12 \
    --genomicsdb-workspace-path $variation_dir/pon/pon_db \
    ${argv} \
    1>$variation_dir/pon/step2_pon.log 2>&1
    echo "end step2_pon"  `date` 
8.1 step2_pon 结果:会产生大量 的 *.json文件
step2_pon_loader.file.png
8.2 step2_pon 结果:会产生大量 的 chr$文件
step2_pon_chr.file.png
8.3 CUP不稳定
cup2.png cpu3.png cpu1.png

我是10T的空间,跑完step2_pon大概消耗7T空间,再做step3 的时候报错,空间不足,无法执行程序。

9. 分析原因:PON第二步 "-L intervals.interval_list"选项需要后缀.interval_list文件,使用bed文件代替,程序运行错误。因此,通过bed文件制作.interval_list文件
awk '{print $1}' Exome-Agilent_V5chr_Clinical.bed|uniq |awk '{print "-L " $1}'|tr "\n" " " > Agilent_V5chr_Clinical.interval_list
10. 新的流程代码,能够成功运行!
# step2_pon 
    tmp=""
    for i in $(ls $variation_dir/pon/*.vcf.gz)
    do
        argv="$tmp -V $i"
        tmp=$argv
    done
        echo ${argv} 
        
    echo "start step2_pon" `date`
    $GATK --java-options "-Xmx100G -Djava.io.tmpdir=./" GenomicsDBImport \
    -R ${REF} \
    $(cat ${INTERVAL_LIST}) \
    --reader-threads 12 \
    --genomicsdb-workspace-path $variation_dir/pon/pon_db \
    ${argv} \
    1>$variation_dir/pon/step2_pon.log 2>&1
    echo "end step2_pon"  `date`
11. step3_pon 代码,最终生成所需的pon.vcf.gz文件!
# step3_pon
    echo "start step3_pon" `date`
    $GATK --java-options "-Xmx100G -Djava.io.tmpdir=./" CreateSomaticPanelOfNormals \
    -R ${REF} \
    --germline-resource ${gnomad_hg19_vcf_gz} \
    -V gendb://$variation_dir/pon/pon_db \
    --tmp-dir $variation_dir/pon \
    -O $variation_dir/pon/pon.vcf.gz \
    1>$variation_dir/pon/step3_pon.log 2>&1
    echo "end step3_pon" `date`

相关文章

网友评论

      本文标题:GATK 4.1.9.0 CreateSomaticPanelO

      本文链接:https://www.haomeiwen.com/subject/xqjoqltx.html