2021/03/02
Evaluation of Whole Genome Sequencing Data
Whole genome sequencing (WGS) can provide comprehensive insights into the genetic makeup oflymphomas.
一、比对
1、选择参考基因组
GRCh37 (hg19) and GRCh38
Unlocalized sequences:已被定为到某条染色体上,但方向或具体位置仍未确定,以_random结尾
Unplaced sequences:尚未被定位到某条染色体,以chrUn_开头
EBV & decoy sequences:不属于人类基因组,但是高通量测序时会被测到的序列,标注为chrEBV及以_decoy结尾的序列
Alternate loci:不同的单倍体型,一般以_alt结尾,也包括HLA序列
2、预处理
trimmomatic:remove sequencing barcodes
*BWA-backtrack 如果使用需要在全长中映射读取的方法,测序读取的低质量部分的修剪提高了对齐质量。
*BWA-MEM 局部比对,不需要修剪
3、比对算法
BWA (BWA-backtrack 、BWA-SW and BWA-MEM),Bowtie 2,and GEM
4、BAM file 处理
samtools,Picard,biobambam,and sambamba
协调reads数据的排序、合并数据(如果样本已在多个lane上测序)、标记或删除PCR重复
二、不同遗传变异类型的鉴定
1、variant calling
SNV(single-nucleotide-variant): GATK HaplotypeCaller and Platypus 、 Mutect2 、 FreeBayes and Strelka2(适用于生殖、体细胞变异)
Indel(insertions and deletions): Mutect2,Strelka2,Platypus
SV(structural variant):插入、删除、重复、反转和易位( insertions, deletions, duplications, inversions, and translocations) Manta ,novoBreak and SvABA,DELLY and LUMPY
Copy number aberrations (CNAs) are also a class of structural variants.
GC bias/coverage bias
2、质量控制
* FASTQC : at the level of FASTQ fifiles
* SAMTOOL: during BAM file postprocessing and at BAM file level
* different variant calling methods themselves provide valuable QC information
3、变异注释
the variants need to be annotated with functional information.
* gene annotation to identify whether the variant affects, e.g., the protein-coding sequence of a gene;
* variant database information to disclose if a variant is, e.g., a known SNP or a known
somatic cancer mutation;
* potentially other information tracks, e.g., about sequence conservation (序列保守性) or regulatory elements.
Tool: ANNOVAR, SnpEff, variant effect predictor, and Rbbt
三、驱动突变的鉴定 driver mutations
1、比预期更高突变率:MuSiC,MutsigCV
2、突变偏高的功能影响:Oncodrive-fm
3、蛋白质某些部分突变的聚类:OncodriveCLUST
4、常用检测基因组非编码区域driver mutation:LARVA、OncodriveFML
四、突变信号的鉴定 mutation signatures
癌症基因组中的一组突变(包括driver和passenger突变)是多个突变过程活动的印记。 一些突变过程对引起的突变有特定的偏好。
1、突变信号的无监督分析
突变目录包含SNV在每个样本的三核苷酸上下文中的频率。
非负矩阵因式分解( nonnegative matrix factorization,NMF):将突变目录分解为突变信号矩阵(mutational signature matrix)和曝光矩阵(exposure matrix,其中包含每个基因组中每个identified signature的活性)
COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer.(https://cancer.sanger.ac.uk/cosmic/)
可能检测到新的信号,要求大量样本的可用性。
2、突变信号的监督分析
Tool: R packages deconstructSigs or YAPSA
网友评论