From genome-wide associations to candidate causal variants by statistical fine-mapping. Daniel J. Schaid, Nature Genetics, 2018.
PMID: 29844615 DOI: 10.1038/s41576-018-0016-z
Part 1
1. Genome-wide Association Studies:全基因组关联研究
2. Complex traits:复杂性状
3. Tag SNPs:标签SNP
4. Linkage disequilibrium:连锁不平衡
5. Causal variants:因果变异(致病突变?)
6. Fine-mapping:精细作图
7. Penalized regression:惩罚回归
8. Summary statistics:合并统计量
9. Trans-ethnic:跨种族
10. Multiple testing corrections:多重检验校正
11. Statistical power:统计效能
12. Genotype imputation:分型填补
13. Cross validation:交叉验证
14. Prior probability:先验概率
15. Posterior inclusion probability:后验概率
16. Expression quantitive trait loci:表达数量性状座位eQTL
常见复杂人类性状(包括数量性状和疾病)通常是由多种环境和遗传因素引起的。GWAS被广泛用于识别染色体上的基因组区域,这些区域决定复杂性状的遗传。到目前为止,美国国家人类基因组研究所NHGRI-欧洲生物信息学研究所EBI的GWAS Catalog已经收录了47681个与复杂性状有统计关联的SNP,代表了2185个关联P值小于10-5的关联性状。这种成功当归功于包含大量SNP且成本效益较高的分型矩阵。但是微矩阵上的SNP通常不会直接导致这种性状发生。相反,之所以选择微矩阵上的标签SNP是因为它们与相邻的SNP高度相关(即具有大量的连锁不平衡LD),因此可以作为较大基因组区域内未检测到的SNP的替代。标签SNP与性状之间的关联可能是间接的,标签SNP与因果SNP关联,而因果SNP与性状直接关联。因为SNP之间的连锁不平衡是十分复杂的,所以找到其潜在的因果变异十分具有挑战性。这就是fine-mapping发挥作用的时候。我们在此讨论的原则也适用于通过全基因组测序研究中常见遗传变异的分析。
Fig.1 从GWAS到fine-mapping选择SNP的经典流程 【Based on genome-wide association study (GWAS) P values summarised in a Manhattan plot, a list of single-nucleotide polymorphisms (SNPs) that achieve genome-wide statistical significance (that is, P value <5*e-8) is used to determine regions of interest for fine-mapping. Each region is typically explored according to the structure of linkage disequilibrium (LD) among SNPs using Haploview plots. Statistical associations are viewed with LocusZoom plots that illustrate the patterns of association of each SNP with the lead SNP, as well as the annotation of genes in the region. The regions can then be partitioned into independent subregions to ease computational burden, based on statistical models that evaluate the simultaneous effects of multiple SNPs on a trait. Statistical fine-mapping is conducted in each region, using one of the methods illustrated in Fig.2. The SNPs selected from fine-mapping are then annotated with genomic features to prioritize follow-up functional studies. eQTL, expression quantitative trait locus. Figure is reproduced with permission from Ref.(Haralambieva, I. H. et al. Genome-wide associations of CD46 and IFI44L genetic variants with neutralizing antibody response to measles vaccine.Hum. Genet. 136, 421–435 (2017)】
Fig.2 Fine-mapping策略的假设示例【All subfigures are based on LocusZoom-style illustrations of marginal single-nucleotide polymorphism (SNP) associations. The -log10(P) values are presented on the left y axis, and variant positions are on the x axis. The gold diamond for each locus represents the peak SNP. The results for other SNPs are colored by descending degree of linkage disequilibrium (LD) with the peak SNP (ordered red, orange, green and blue dots). The purple bars represent additional variant-level statistics produced by fine-mapping (that is β-values for penalized regression and posterior inclusion probabilities (PIPs) for Bayesian methods), and the corresponding scale is on the right y axis. The light grey boxes represent the regions selected by fine-mapping.】