美文网首页R数量遗传或生统科研信息学
计算遗传分化指数(Genetic differentiation

计算遗传分化指数(Genetic differentiation

作者: 小明的数据分析笔记本 | 来源:发表于2019-12-10 21:57 被阅读0次
    原文链接

    Analysis of genome data
    主要内容:利用vcf文件计算不同群体之间的遗传分化指数
    方法: vcfR (R包) genetic_diff()函数

    数据:https://hail.is/docs/0.2/getting_started.html 中提到的 简化版 he public 1000 Genomes dataset

    原教程翻译

    大部分的种群研究中一个基本的问题就是种群是否具有多样性以及这种多样性是否在种群之间共享(A fundamental question to most population studies is whether populations are diverse and whether this diversity is shared among the populations)。为了解决种群内多样性这个问题遗传学家提出了杂合性(heterozygosity)这个概念(To address the question of within population diversity geneticists typically report heterozygosity)。杂合性(heterozygosity)是指在种群中随机选择两个等位基因它们是不同的概率(This is the probability that two alleles randomly chosen from a opulation will be different)。遗传学家提出Fst和它的类似物来解决种群分化的问题。通过Fst来衡量种群分化最初由Sewal Wright提出。(To address differentiation population geneticists typically utilize Fst or one of its analogues. Population differention measured by Fst was originally proposed by Sewall Wright)。
    (未完待续)
    代码

    library(vcfR)
    vcfdata<-read.vcfR("GWAS_practice/1kg.vcf")
    length(colnames(vcfdata@gt)[-1])
    pop<-read.table("GWAS_practice/data/1kg_annotations.txt",header=T,sep="\t")
    colnams(pop)
    pop$Population
    pop$SuperPopulation
    dim(pop)
    head(pop)
    pops<-data.frame(Sample=character(),
                     Population = character(),
                     SuperPopulation = character(),
                     isFemale = character(),
                     PurpleHair = character(),
                     CaffeineConsumption = character())
    for (i in colnames(vcfdata@gt)[-1]){
      pops <- rbind(pops,pop[which(pop$Sample == i),])
    }
    dim(pops)
    myDiff<-genetic_diff(vcfdata,
                         pops$SuperPopulation,
                         method="nei")
    dim(myDiff)
    colnames(myDiff)
    knitr::kable(head(myDiff[,1:13]))
    knitr::kable(head(myDiff[,14:17]))
    knitr::kable(round(colMeans(myDiff[,c(3:8,14,17)], na.rm = TRUE), digits = 3))
    library(reshape2)
    library(ggplot2)
    dpf <- melt(myDiff[,c(3:7,17)], 
                varnames=c('Index', 'Sample'), 
                value.name = 'Depth', na.rm=TRUE)
    ggplot(dpf, aes(x=variable, y=Depth)) + 
      geom_violin(fill="#2ca25f", adjust = 1.2)+
      labs(x="",y="")+
      theme_bw()
    
    image.png

    好像也看不出来种群间有啥区别

    相关文章

      网友评论

        本文标题:计算遗传分化指数(Genetic differentiation

        本文链接:https://www.haomeiwen.com/subject/xkxsgctx.html