二代全基因组分析上游分析

作者: 日月其除 | 来源:发表于2022-05-14 00:31 被阅读0次

二代全基因组分析上游分析
重测序分析（14）全基因组关联分析GWAS介绍
24nt siRNA与甲基化关联分析
Science | 群体研究新思路：De novo + GWAS
基因家族分析 | 番茄Nramp基因家族分析（一）
GWAS定义
如何做GWAS关联分析
GATK分析
表观研究数据库推荐 | 查询疾病/表型与表观遗传之间的联系，原来
全基因组关联分析数据分析流程

找到两个参考网站，也有一些迷惑的地方，我用的第二个网站
这里主要讨论上游分析，下游分析网站中也有对应的脚本，但是这里不过多讨论
网站一：
https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/#somatic-variant-calling-workflow

主要分为三个大部
1.Genome Alignment
2 .Alignment Co-Cleaning
3.Somatic Variant Calling
第一部分：

bamtofastq把bam转为fq
bwa mem 比对到参考基因组
3）使用picard 对bam文件sort
4）使用picard merge bam，这里我个人认为是把同个样本的bam merge在一起
5）使用picard 去重复(picard.jar MarkDuplicates)
第二部分：
就是这里开始看不懂了
这一部分三个脚本使用了
GenomeAnalysisTK.jar中四个功能RealignerTargetCreator， IndelRealigner，BaseRecalibrator，PrintReads。
但是 GenomeAnalysisTK.jar这个java文件我就没找到。然后再gatk中是否能找到。

其中PrintReads在gatk中就是Print reads in the SAM/BAM/CRAM file
gatk中BaseRecalibrato的功能是 Generates recalibration table for Base Quality Score Recalibration (BQSR)
然后IndelRealigner和RealignerTargetCreator这个功能在gatk中没找到。
Google了一下好像是gatk 4没有这个脚本，参考别人的提问：https://www.biostars.org/p/339650/
而且pipline应该是更新了

后续我在gatk的官网中找到个另一个pipline
参考网址：https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows