美文网首页
实操|rrBLUP包RRBLUP

实操|rrBLUP包RRBLUP

作者: Zukunft_Lab | 来源:发表于2021-05-07 10:33 被阅读0次

    数据处理

    VCF转为 rrBLUP {-1,0,1} 格式

    rrBLUP可识别的基因型格式为 {-1,0,1} (行头为marker,列为sample),因此需要对基本数据处理转换;

    编码G矩阵计算时, 有不同的编码形式,如下:

    • 0,1,2; 即AA是0, 表示major基因, 1 表示杂合, 2表示aa(minor).
    • -1, 0, 1; 即-1是AA, 表示major基因型, 0表示杂合, 1表示aa(minor).
    ## vcftools 生成{ 0,1,2} 矩阵    
    vcftools --vcf test.genotypes_no_missing_IDs.vcf --012 --out snp_matrix 
    
    • --012
      This option outputs the genotypes as a large matrix. Three files are produced. The first, with suffix ".012", contains the genotypes of each individual on a separate line. Genotypes are represented as 0, 1 and 2, where the number represent that number of non-reference alleles. Missing genotypes are represented by -1. The second file, with suffix ".012.indv" details the individuals included in the main file. The third file, with suffix ".012.pos" details the site locations included in the main file.
    ##R    
    data<-as.matrix(read.table("snp_matrix.012",header = F))
    data1<-data[,-c(1)] #去列名
    data2 <- data1 - 1 #0,1,2 转-1,0,1
    write.table(mydata2, file="SNP_TMP.txt", row.names=FALSE, col.names=FALSE)#保存文件为纯数字的txt格式
    ##shell
    cat SNP_TMP.txt | sed 's/-2/NA/g'  > snp.txt
    

    文件输入

    示例文件:
    traits.txt: https://pbgworks.org/sites/pbgworks.org/files/traits.txt
    snp.txt: https://pbgworks.org/sites/pbgworks.org/files/snp.txt

    Pheno <- as.matrix(read.table(file ="/data4/ykzhang/chip_207/7GS/rrblup/format/sheep207_mvp.txt", header=TRUE))
    Markers <- as.matrix(read.table(file="/data4/ykzhang/chip_207/7GS/rrblup/format/snp.txt"), header=F)
    

    数据过滤和填充

    impute = A.mat(Markers,max.missing=0.5,impute.method="mean",return.imputed=T)#按50%缺失值过滤,并按均值填充 
    Markers_impute2 = impute$imputed
    

    简单交叉验证

    traits=1 
    cycles=300 
    accuracy = matrix(nrow=cycles, ncol=traits)
    for(r in 1:cycles){
      train= as.matrix(sample(1:207, 180)) 
      test<-setdiff(1:207,train)
      Pheno_train=Pheno[train,]
      m_train=Markers_impute2[train,]
      Pheno_valid=Pheno[test,]
      m_valid=Markers_impute2[test,]
    
      yield=Pheno_train[,7]
      yield_answer<-mixed.solve(yield, Z=m_train, K=NULL, SE = FALSE, return.Hinv=FALSE)
      pred_yield_valid =  m_valid %*% as.matrix(yield_answer$u)
      pred_yield=pred_yield_valid[,1]+yield_answer$beta
      yield_valid = Pheno_valid[,7]
      accuracy[r,1] <-cor(pred_yield_valid, yield_valid, use="complete" )
      }
    mean(accuracy)
    
    多性状自动化计算

    资料:

    Introduction to Genomic Selection in R using the rrBLUP Package
    【GS专栏】8-全基因组选择实战之RRBLUP

    相关文章

      网友评论

          本文标题:实操|rrBLUP包RRBLUP

          本文链接:https://www.haomeiwen.com/subject/lkiilltx.html