1. 全外显子组测序技术简介
全外显子组测序(whole exome sequencing,WES)是指利用序列捕获技术将全基因组外显子区域DNA捕捉并富集,然后通过高深度的高通量测序发现与蛋白质功能变异相关的遗传突变分析方法。外显子组约占全基因组的1%,却包含约85%的致病突变。因此,相比全基因组测序,全外显子测序更加经济、高效。外显子组测序主要用于识别和研究与疾病、种群进化相关的编码区及重要调控功能的非转录区域(Untranslated Regions,UTR)内的变异。结合大量的公共数据库提供的外显子数据,有利于更好地解释所得变异与疾病的关系。
优势:
1)外显子区域占比小(1%),更容易做到更高深度测序,检测到更多低频和罕见变异,同时也能降低测序费用和存储空间
2)直接对蛋白编码序列进行测序,找出影响蛋白结构的变异
2. 全外显子组测序分析流程
2.1 质量控制(Quality Control,QC)
# 安装指控软件fastp
$ conda install fastp
# 检查是否安装成功
$ fastp --help
## usage: fastp [options] ...
## options:
## -i, --in1 read1 input file name (string [=])
## -o, --out1 read1 output file name (string [=])
## -I, --in2 read2 input file name (string [=])
## -O, --out2 read2 output file name (string [=])
## ......
$ fastp -i input.R1.fq -o output.R1.fq -I input.R2.fq -O output.R2.fq
# 批量
$ bash run_fastp.sh
ꔷ run_fastp.sh:
#!/bin/bash
for i in ~/FASTQ/*_R1_001.fastq.gz
do
fileName=`echo "$i" |awk -F "/" '{print $NF;exit}'|awk -F "_R" '{print $1;exit}'`
sampleName=`echo "$i" |awk -F"/" '{print $NF;exit}'|awk -F "_" '{print $1;exit}'`
fastp -i "$fileName"_R1_001.fastq.gz -o "$sampleName"_1.fastp.fq.gz -I "$fileName"_R2_001.fastq.gz -O "$sampleName"_2.fastp.fq.gz -h "$sampleName".html
done
ꔷ 主要参数设置:
-i, --in1 输入read1文件名
-o, --out1 输出read1文件名
-I, --in2 输入read2文件名
-O, --out2 输出read2文件名,软件默认是根据扩展名识别压缩文件,所以输出文件需要加上*.gz扩展名;
2.2 GATK使用流程
2.2.1 工具安装
- BWA
BWA,即Burrows-Wheeler-Alignment Tool,是一种能够将差异度较小的序列比对到一个较大的参考基因组上的软件包。由三个不同的算法组成:
ꔷ BWA-backtrack: 是用来比对较短的序列,reads 长度最长能到 100bp
ꔷ BWA-SW: 用于比对 long-read ,支持的长度为 70bp-1Mbp;同时支持剪接性比对。(通常用于对比如 PacBio 或 Nanopore 这样的长读取)
ꔷ BWA-MEM: 最常用的模式,支持较长的read长度,同时支持剪接性比对(split alignments),且 BWA-MEM 对于 70bp-100bp 的 Illumina 数据来说,效果也更好些。
sudo apt install bwa -y
bwa # 检验是否安装成功
##
## Program: bwa (alignment via Burrows-Wheeler transformation)
## Version: 0.7.17-r1188
## Contact: Heng Li <lh3@sanger.ac.uk>
##
## Usage: bwa <command> [options]
##
## Command: index index sequences in the FASTA format
## mem BWA-MEM algorithm
## fastmap identify super-maximal exact matches
## pemerge merge overlapping paired ends (EXPERIMENTAL)
## aln gapped/ungapped alignment
## samse generate alignment (single ended)
## sampe generate alignment (paired ended)
## bwasw BWA-SW for long queries
##
## shm manage indices in shared memory
## fa2pac convert FASTA to PAC format
## pac2bwt generate BWT from PAC
## pac2bwtgen alternative algorithm for generating BWT
## bwtupdate update .bwt to the new format
## bwt2sa generate SA from BWT and Occ
##
## Note: To use BWA, you need to first index the genome with `bwa index'.
## There are three alignment algorithms in BWA: `mem', `bwasw', and
## `aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
## first. Please `man ./bwa.1' for the manual.
##
- samtools
wget https://github.com/samtools/samtools/releases/download/1.20/samtools-1.20.tar.bz2 # 下载
tar jxvf samtools-1.20.tar.bz2 #解压
cd samtools-1.20/
./configure --prefix=/software/samtools-1.20
make
make install
./samtools # 检验是否安装成功
##
## Program: samtools (Tools for alignments in the SAM format)
## Version: 1.20 (using htslib 1.20)
##
## Usage: samtools <command> [options]
##
## Commands:
## -- Indexing
## dict create a sequence dictionary file
## faidx index/extract FASTA
## fqidx index/extract FASTQ
## index index alignment
##
## -- Editing
## calmd recalculate MD/NM tags and '=' bases
## fixmate fix mate information
## reheader replace BAM header
## targetcut cut fosmid regions (for fosmid pool only)
## addreplacerg adds or replaces RG tags
## markdup mark duplicates
## ampliconclip clip oligos from the end of reads
##
## -- File operations
## collate shuffle and group alignments by name
## cat concatenate BAMs
## consensus produce a consensus Pileup/FASTA/FASTQ
## merge merge sorted alignments
## mpileup multi-way pileup
## sort sort alignment file
## split splits a file by read group
## quickcheck quickly check if SAM/BAM/CRAM file appears intact
## fastq converts a BAM to a FASTQ
## fasta converts a BAM to a FASTA
## import Converts FASTA or FASTQ files to SAM/BAM/CRAM
## reference Generates a reference from aligned data
## reset Reverts aligner changes in reads
##
## -- Statistics
## bedcov read depth per BED region
## coverage alignment depth and percent coverage
## depth compute the depth
## flagstat simple stats
## idxstats BAM index stats
## cram-size list CRAM Content-ID and Data-Series sizes
## phase phase heterozygotes
## stats generate stats (former bamcheck)
## ampliconstats generate amplicon specific stats
##
## -- Viewing
## flags explain BAM flags
## head header viewer
## tview text alignment viewer
## view SAM<->BAM<->CRAM conversion
## depad convert padded BAM to unpadded BAM
## samples list the samples in a set of SAM/BAM/CRAM files
##
## -- Misc
## help [cmd] display this help message or help for [cmd]
## version detailed version information
##
- sambamba
# 下载文件
wget https://github.com/biod/sambamba/archive/refs/tags/v1.0.1.tar.gz
tar -zxvf v1.0.1.tar.gz
cd sambamba-1.0.1/
make # 若报错,需先安装ldc2(wget https://github.com/ldc-developers/ldc/releases/download/v1.35.0/ldc2-1.35.0-linux-x86_64.tar.xz)
echo "PATH=$PATH:/sambamba-1.0.1/bin/path" >> ~/.bashrc
source ~/.bashrc
sambamba-1.0.1 --help # 查看是否安装成功
##
## sambamba 1.0.1
## by Artem Tarasov and Pjotr Prins (C) 2012-2023
## LDC 1.35.0 / DMD v2.105.2 / LLVM16.0.6 / bootstrap LDC - the LLVM D compiler (1.35.0)
##
##
## Usage: sambamba [command] [args...]
##
## Available commands:
##
## view view contents and convert from one format
## to another (SAM/BAM/JSON/UNPACK)
## index build index (BAI)
## merge merge files (BAM)
## sort sort file (BAM)
## slice slice file (BAM using BED)
## markdup mark or remove duplicates (BAM)
## subsample subsample (BAM)
## flagstat output statistics (BAM)
## depth output statistics (BAM)
## validate simple validator (BAM)
##
## No longer recommended:
##
## mpileup parallel execution of samtools (BAM)
##
## To get help on a particular command, call it without args.
##
## Global options
##
## -q quiet mode (do not show banner)
##
## For bug reports and feature requests see
##
## https://github.com/biod/
##
- GATK4
wget https://github.com/broadinstitute/gatk/releases/download/4.5.0.0/gatk-4.5.0.0.zip
unzip gatk-4.5.0.0.zip # 解压安装包
cd gatk-4.5.0.0
echo "PATH=$PATH:/gatk-4.5.0.0/path" >> ~/.bashrc
source ~/.bashrc
gatk -h # 查看是否安装成功
##
## Usage template for all tools (uses --spark-runner LOCAL when used with a Spark tool)
## gatk AnyTool toolArgs
##
## Usage template for Spark tools (will NOT work on non-Spark tools)
## gatk SparkTool toolArgs [ -- --spark-runner <LOCAL | SPARK | GCS> sparkArgs ]
##
## Getting help
## gatk --list Print the list of available tools
##
## gatk Tool --help Print help on a particular tool
##
## Configuration File Specification
## --gatk-config-file PATH/TO/GATK/PROPERTIES/FILE
##
## gatk forwards commands to GATK and adds some sugar for submitting spark jobs
##
## --spark-runner <target> controls how spark tools are run
## valid targets are:
## LOCAL: run using the in-memory spark runner
## SPARK: run using spark-submit on an existing cluster
## --spark-master must be specified
## --spark-submit-command may be specified to control the Spark submit command
## arguments to spark-submit may optionally be specified after --
## GCS: run using Google cloud dataproc
## commands after the -- will be passed to dataproc
## --cluster <your-cluster> must be specified after the --
## spark properties and some common spark-submit parameters will be translated
## to dataproc equivalents
##
## --dry-run may be specified to output the generated command line without running it
## --java-options 'OPTION1[ OPTION2=Y ... ]' optional - pass the given string of options to the
## java JVM at runtime.
## Java options MUST be passed inside a single string with space-separated values.
##
## --debug-port <number> sets up a Java VM debug agent to listen to debugger connections on a
## particular port number. This in turn will add the necessary java VM arguments
## so that you don't need to explicitly indicate these using --java-options.
## --debug-suspend sets the Java VM debug agent up so that the run get immediatelly suspended
## waiting for a debugger to connect. By default the port number is 5005 but
## can be customized using --debug-port
##
- bcftools
sudo apt-get install bcftools # Ubuntu 系统
bcftools --help # 查看是否安装成功
##
## Program: bcftools (Tools for variant calling and manipulating VCFs and BCFs)
## Version: 1.10.2 (using htslib 1.10.2-3ubuntu0.1)
##
## Usage: bcftools [--version|--version-only] [--help] <command> <argument>
##
## Commands:
##
## -- Indexing
## index index VCF/BCF files
##
## -- VCF/BCF manipulation
## annotate annotate and edit VCF/BCF files
## concat concatenate VCF/BCF files from the same set of samples
## convert convert VCF/BCF files to different formats and back
## isec intersections of VCF/BCF files
## merge merge VCF/BCF files files from non-overlapping sample sets
## norm left-align and normalize indels
## plugin user-defined plugins
## query transform VCF/BCF into user-defined formats
## reheader modify VCF/BCF header, change sample names
## sort sort VCF/BCF file
## view VCF/BCF conversion, view, subset and filter VCF/BCF files
##
## -- VCF/BCF analysis
## call SNP/indel calling
## consensus create consensus sequence by applying VCF variants
## cnv HMM CNV calling
## csq call variation consequences
## filter filter VCF/BCF files using fixed thresholds
## gtcheck check sample concordance, detect sample swaps and contamination
## mpileup multi-way pileup producing genotype likelihoods
## roh identify runs of autozygosity (HMM)
## stats produce VCF/BCF stats
##
## Most commands accept VCF, bgzipped VCF, and BCF with the file type detected
## automatically even when streaming from a pipe. Indexed VCF and BCF will work
## in all situations. Un-indexed VCF and BCF and streams will work in most but
## not all situations.
##
2.2.2 原始数据处理
- 下载参考基因组数据并建立索引
wget https://github.com/broadinstitute/gatk/blob/master/src/test/resources/large/Homo_sapiens_assembly38.fasta.gz
gunzip Homo_sapiens_assembly38.fasta.gz # 解压
bwa index /data/Ref/Homo_sapiens_assembly38.fasta.gz # 大于2G的参考基因组要用参数-a bwtsw
## [bwa_index] Construct BWT for the packed sequence...
## [BWTIncCreate] textLength=383669262, availableWord=38996220
## [BWTIncConstructFromPacked] 10 iterations done. 64326510 characters processed.
## [BWTIncConstructFromPacked] 20 iterations done. 118839214 characters processed.
## [BWTIncConstructFromPacked] 30 iterations done. 167286270 characters processed.
## [BWTIncConstructFromPacked] 40 iterations done. 210342094 characters processed.
## [BWTIncConstructFromPacked] 50 iterations done. 248606174 characters processed.
## ......
生成5个文件:

- 比对到参考基因组
# 此处一个样本有4个fastq,分开处理,将同⼀通道的两个fq文件⼀起处理,生成⼀个sam/bam⽂件,后面进行merge
bwa mem -t 20 -M -Y -R '@RG\tID:V300059041_L02_658\tPL:ILLUMINA\tLB:lib1\tSM:organoid_V300059041_L02' /data/CC56organoid/clean_data/V300059041_L01_658_1.fq.gz /data/CC56organoid/clean_data/V300059041_L01_658_2.fq.gz > V300059041_L01_658.sam 2> bwa_1.log
bwa mem -t 20 -M -Y -R '@RG\tID:V300059041_L02_658\tPL:ILLUMINA\tLB:lib1\tSM:organoid_V300059041_L02' /data/Ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa /data/CC56organoid/clean_data/V300059041_L02_658_1.fq.gz /data/CC56organoid/clean_data/V300059041_L02_658_2.fq.gz > V300059041_L02_658.sam 2> bwa_2.log
# sam转bam格式 (bam是二进制文件,运算速度快)
samtools view -bS V300059041_L01_658.sam > V300059041_L01_658.bam
samtools view -bS V300059041_L02_658.sam > V300059041_L02_658.bam
# 排序
samtools sort V300059041_L01_658.bam -o V300059041_L01_658_sorted.bam
samtools sort V300059041_L02_658.bam -o V300059041_L02_658_sorted.bam
# 合并
samtools merge CC56organoid.bam V300059041_L01_658_sorted.bam V300059041_L02_658_sorted.bam
2.2.3 运行GATK4
- 去除重复reads
# sambamba去重复时会自动建立索引
sambamba-1.0.1 markdup -t 10 -r -p V300059041_L01_658_sorted.bam V300059041_L01_658_sorted.markdup.bam 2> sambamba_markdup_1_log.txt
sambamba-1.0.1 markdup -t 10 -r -p V300059041_L02_658_sorted.bam V300059041_L02_658_sorted.markdup.bam 2> sambamba_markdup_2_log.txt
sambamba-1.0.1 markdup -t 10 -r -p CC56organoid.bam CC56organoid.markdup.bam 2> sambamba_markdup_log.txt
# 建立reference genome序列的dict
gatk CreateSequenceDictionary -R Homo_sapiens_assembly38.fasta
- Germline variant discovery:SNPs & Indels (HaplotypeCaller)
# 生成中间g.vcf文件
gatk --java-options "-Xmx4g" HaplotypeCaller \
-R Homo_sapiens_assembly38.fasta \
-I V300059041_L01_658_sorted.markdup.bam \
-O V300059041_L01_658_sorted.markdup.g.vcf \
-ERC GVCF
gatk --java-options "-Xmx4g" HaplotypeCaller \
-R Homo_sapiens_assembly38.fasta \
-I V300059041_L02_658_sorted.markdup.bam \
-O V300059041_L02_658_sorted.markdup.g.vcf \
-ERC GVCF
# 合并GVCF文件
gatk CombineGVCFs -R Homo_sapiens_assembly38.fasta \
--variant V300059041_L01_658_sorted.markdup.g.vcf \
--variant V300059041_L02_658_sorted.markdup.g.vcf \
-O CC56organoid.HaplotypeCaller.g.vcf
# 基因型调用
gatk GenotypeGVCFs -R Homo_sapiens_assembly38.fasta -V CC56organoid.HaplotypeCaller.g.vcf -O CC56organoid.HaplotypeCaller.final.vcf
- Somatic variant discovery: SNPs & Indels
# 突变检测,生成初步的 VCF 文件 (Mutect2)
gatk --java-options "-Xmx10G" Mutect2 -R Homo_sapiens_assembly38.fasta -I CC56organoid.markdup.bam -O CC56organoid.markdup.unfiltered.vcf
# 过滤
gatk --java-options "-Xmx10G" FilterMutectCalls -R Homo_sapiens_assembly38.fasta -V CC56organoid.markdup.unfiltered.vcf -O CC56organoid.markdup.filtered.vcf
# 进一步过滤 (可选)
bcftools filter -i 'DP>10 && QUAL>30' CC56organoid.markdup.filtered.vcf -o final_CC56organoid.vcf
- GATK bundle 数据下载
GATK 在官网中提供了resource bundle,里面包含了所需要的很多数据。目前提供FTP 和 Google Cloud bucket 2种下载方式,Google可以https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0?pli=1自行下载,FTP可使用lftp工具进行访问和下载
# 安装lftp(需要root账号或权限)
$ sudo apt install lftp
$ lftp ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/ # 密码为空,直接回车

lftp gsapubftp-anonymous@ftp.broadinstitute.org:/bundle> ls hg38
-rw-rw-r-- 1 vdauwera broad 53238342 Jan 5 2016 1000G_omni2.5.hg38.vcf.gz
-rw-rw-r-- 1 vdauwera broad 1544108 Jan 5 2016 1000G_omni2.5.hg38.vcf.gz.tbi
-rw-rw-r-- 1 vdauwera broad 1888262073 Jan 5 2016 1000G_phase1.snps.high_confidence.hg38.vcf.gz
-rw-rw-r-- 1 vdauwera broad 2128536 Jan 5 2016 1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi
-rw-rw-r-- 1 vdauwera broad 3053999 Dec 1 2016 Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz
-rw-rw-r-- 1 vdauwera broad 421689 Dec 1 2016 Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz.tbi
drwxrwxr-x 2 vdauwera broad 644 Dec 1 2016 beta
-rw-rw-r-- 1 vdauwera broad 1560889937 Dec 1 2016 dbsnp_138.hg38.vcf.gz
-rw-rw-r-- 1 vdauwera broad 2321422 Jan 5 2016 dbsnp_138.hg38.vcf.gz.tbi
-rw-rw-r-- 1 vdauwera broad 3215686850 Jan 5 2016 dbsnp_144.hg38.vcf.gz
-rw-rw-r-- 1 vdauwera broad 2434204 Jan 5 2016 dbsnp_144.hg38.vcf.gz.tbi
-rw-rw-r-- 1 vdauwera broad 3411143311 Dec 1 2016 dbsnp_146.hg38.vcf.gz
-rw-rw-r-- 1 vdauwera broad 2466606 Dec 1 2016 dbsnp_146.hg38.vcf.gz.tbi
-rw-r--r-- 1 vdauwera broad 179601790 Feb 10 2017 hapmap_3.3_grch38_pop_stratified_af.vcf.gz
-rw-r--r-- 1 vdauwera broad 1547721 Feb 10 2017 hapmap_3.3_grch38_pop_stratified_af.vcf.gz.tbi
-rw-rw-r-- 1 vdauwera broad 62043448 Jan 5 2016 hapmap_3.3.hg38.vcf.gz
-rw-rw-r-- 1 vdauwera broad 1552123 Jan 5 2016 hapmap_3.3.hg38.vcf.gz.tbi
-rw-rw-r-- 1 vdauwera broad 581712 Jan 6 2016 Homo_sapiens_assembly38.dict
-rw-r--r-- 1 vdauwera broad 487553 Dec 1 2016 Homo_sapiens_assembly38.fasta.64.alt
-rw-rw-r-- 1 vdauwera broad 160928 Dec 1 2016 Homo_sapiens_assembly38.fasta.fai
-rw-rw-r-- 1 vdauwera broad 890202736 Jan 5 2016 Homo_sapiens_assembly38.fasta.gz
-rw-rw-r-- 1 vdauwera broad 20685880 Jan 5 2016 Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
-rw-rw-r-- 1 vdauwera broad 1500013 Jan 5 2016 Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi
-rw-rw-r-- 1 vdauwera broad 599399 Dec 1 2016 wgs_calling_regions.hg38.interval_list
# 根据文件夹内容可自行下载
wget -b -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz
wget -b -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_138.hg38.vcf.gz
wget -b -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
- 注释VCF文件 (VariantAnnotator)
gatk --java-options "-Xmx8g -Xms8g" VariantAnnotator \
-R /data/shumin/WES/Index/ref/ref_fa/Homo_sapiens_assembly38.fasta \
-I /data/shumin/WES/Index/ref/alignment/CC56organoid.markdup.bam \
-V CC56organoid.HaplotypeCaller.final.vcf \
-O CC56organoid.HaplotypeCaller.final.annotate.vcf \
--dbsnp /data/WES/ref_data/dbsnp_138.hg38.vcf.gz
- Indel / SNP 过滤 (VariantRecalibrator)
##Indel Mode
gatk --java-options "-Xmx8g -Xms8g" VariantRecalibrator \
-R /data/shumin/WES/Index/ref/ref_fa/Homo_sapiens_assembly38.fasta \
-V CC56tissueA.HaplotypeCaller.final.annotate.vcf \
-resource:mills,known=true,training=true,truth=true,prior=12.0 /data/shumin/WES/ref_data/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /data/shumin/WES/ref_data/dbsnp_138.hg38.vcf.gz \
-an QD -an SOR \
--mode INDEL \
--max-gaussians 4 \
--rscript-file CC56tissueA.plots.R \
--tranches-file CC56tissueA.tranches \
-O CC56tissueA.recal \
1> CC56tissueA.indel.vqsr.log 2>&1
gatk --java-options "-Xmx8g -Xms8g" ApplyVQSR \
-R /data/shumin/WES/Index/ref/ref_fa/Homo_sapiens_assembly38.fasta \
-V CC56tissueA.HaplotypeCaller.final.annotate.vcf \
--truth-sensitivity-filter-level 99.0 \
--tranches-file CC56tissueA.tranches \
--recal-file CC56tissueA.recal \
--mode INDEL \
-O CC56tissueA.indels.VQSR.vcf \
1>CC56tissueA.indels.ApplyVQSR.log 2>&1
echo "** Indels VQSR done **"
##SNP Mode
gatk --java-options "-Xmx8g -Xms8g" VariantRecalibrator \
-R /data/shumin/WES/Index/ref/ref_fa/Homo_sapiens_assembly38.fasta \
-V CC56tissueA.indels.VQSR.vcf \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 /data/shumin/WES/ref_data/hapmap_3.3.hg38.vcf.gz \
-resource:omini,known=false,training=true,truth=false,prior=12.0 /data/shumin/WES/ref_data/1000G_omni2.5.hg38.vcf.gz \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 /data/shumin/WES/ref_data/1000G_phase1.snps.high_confidence.hg38.vcf.gz \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /data/shumin/WES/ref_data/dbsnp_138.hg38.vcf.gz \
-an QD -an MQ \
--mode SNP \
--max-gaussians 4 \
-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 95.0 -tranche 90.0 \
--rscript-file CC56tissueA.indels.snps.plots.R \
--tranches-file CC56tissueA.indels.snps.tranches \
-O CC56tissueA.indels.snps.recal \
1>CC56tissueA.indels.snps.vqsr.log 2>&1
gatk --java-options "-Xmx8g -Xms8g" ApplyVQSR \
-R /data/shumin/WES/Index/ref/ref_fa/Homo_sapiens_assembly38.fasta \
-V CC56tissueA.indels.VQSR.vcf \
--truth-sensitivity-filter-level 99.0 \
--tranches-file CC56tissueA.indels.snps.tranches \
--recal-file CC56tissueA.indels.snps.recal \
--mode SNP \
-O CC56tissueA.indels.snps.VQSR.vcf \
1>CC56tissueA.indels.snps.ApplyVQSR.log 2>&1
echo "** SNPs VQSR done **"
- 转化为maf格式
perl /data/shumin/software/vcf2maf/vcf2maf.pl --input-vcf CC56tissueA.indels.snps.VQSR.vcf --output-maf CC56tissueA.indels.snps.VQSR.maf --ref-fasta /data/shumin/WES/Index/ref/ref_fa/Homo_sapiens_assembly38.fasta --tumor-id CC56tissueA --vep-path /data/shumin/software/ensembl-vep-release-112.0 --vep-data /data/shumin/software/ensembl-vep-release-112.0/vep_cache --cache-version 112 --ncbi-build GRCh38
网友评论