1 二代数据过滤(现在二代数据质量较高,可不过滤)
使用fastp过滤,参数默认
fastp -w 30 -i ${i}_1.fq.gz -I ${i}_2.fq.gz -o ${i}_ft1.fq.gz -O ${i}_ft2.fq.gz
2 SNPCalling
2.1 对参考基因组建立索引
bwa index ref.fa
2.2 将测序数据与比对到参考基因组
bwa mem -t 100 -R "@RG\tID:Sample_ID\tLB:Sample_IB\tPL:ILLUMINA\tSM:Sample_ID"ref.fa ${i}_ft1.fq.gz ${i}_ft2.fq.gz> ./${i}.sam
samtool view -@ 100 -bS ${i}.sam > ${i}.bam
samtools sort -@ 100 ${i}.bam -o ${i}.sort.bam
samtools rmdup ${i}.sort.bam ${i}.rmdup.bam
2.3 gatk call SNP
对参考基因建立索引
samtools faidx ref.fa
GATK CreateSequenceDictionary -R ref.fa -O ref.dict
call SNP
gatk HaplotypeCaller --emit-ref-confidence GVCF -R ref.fa -I ${i}.rmdup.bam -O ${i}.g.vcf
将gvcf文件压缩
bcftools view {i}.g.vcf -Oz -o {i}.g.vcf.gz
对gvcf建立索引
bcftools index -t ${i}.g.vcf.gz
将多个gvcf文件合并为一个gvcf
gatk CombineGVCFs -R ref.fna --variant gvcf.list -O cohort.g.vcf.gz
# gvcf.lst 为存放多个gvcf文件名字的文本文件
gatk GenotypeGVCFs -R ref.fa -V merge.g.vcf -O merge.vcf
网友评论