美文网首页遗传参数评估生物信息-生物统计生物信息
blupf90根据G矩阵和H矩阵构建PCA分析以及与Plink以

blupf90根据G矩阵和H矩阵构建PCA分析以及与Plink以

作者: 育种数据分析之放飞自我 | 来源:发表于2019-04-28 17:55 被阅读0次

    模拟一套数据, 5个世代, 最后三代有基因型数据, 每个世代400个个体, SNP为50K

    1. blupf90构建G矩阵的PCA

    blupf90如果想要进行GBLUP分析, 不写系谱信息即可, 示例par文件:

    DATAFILE
    dat_f90.txt
    TRAITS
    10         # This is column 10 (phenotype) from QMSim data file
    FIELDS_PASSED TO OUTPUT
    1          # This will copy the ID number to the renf90.dat data file
    WEIGHT(S)  # WARNING: ALWAYS PUT AN EMPTY LINE AFTER THIS!!!!!
    
    RESIDUAL_VARIANCE
    0.9                # add starting values for residual variance
    EFFECT
    4 cross alpha    # Fit generation as a fixed effect, 'cross alpha' is a class in SAS
    EFFECT
    1 cross alpha    # Fit animal effect
    RANDOM
    animal           # Fit animal effect (A matrix) for the effect directly above it (column 1, animal)
    #FILE
    #ped_f90.txt  #  pedigree file (animal, sire, dam), 0's are missing always!!!
    #FILE_POS
    #1 2 3 0 0        # indicates that column 1 = Animal, column 2 = Sire, column 3 = Dam
    SNP_FILE
    yM.txt
    (CO)VARIANCES    
    0.1               # add starting values for additive animal effect
    OPTION alpha_size 25            # Equal to the max number of characters within a column
    OPTION max_string_readline 800  # maximum number of characters in one line of data file
    OPTION max_field_readline 100   # maximum number of columns in the dataset
    #OPTION saveHinvOrig
    #OPTION saveHinv
    #OPTION sol se
    #OPTION use_yams
    #OPTION missing -999
    OPTION plotpca
    

    运行preGSf90后, 会生成pc1vspc2文件, 里面包括PC1和PC2两列, 增加世代为pop, 然后使用R画图:

    pca = read.table(("pc1vspc2"))
    head(pca)
    names(pca) = c("PC1","PC2")
    pca$pop = rep(c("A","B","C"),each=400)
    library(ggplot2)
    p <- ggplot(pca, aes(x=PC1, y=PC2, colour=pop)) 
    p <- p + geom_point(size=2)
    p <- p + stat_ellipse(level = 0.95, size = 1)
    p <- p + geom_hline(yintercept = 0) 
    p <- p + geom_vline(xintercept = 0) 
    p <- p + theme_bw()
    p
    

    结果:

    image

    2. blupf90构建H矩阵的PCA

    需要定义系谱和基因型, 示例par文件:

    DATAFILE
    dat_f90.txt
    TRAITS
    10         # This is column 10 (phenotype) from QMSim data file
    FIELDS_PASSED TO OUTPUT
    1          # This will copy the ID number to the renf90.dat data file
    WEIGHT(S)  # WARNING: ALWAYS PUT AN EMPTY LINE AFTER THIS!!!!!
    
    RESIDUAL_VARIANCE
    0.9                # add starting values for residual variance
    EFFECT
    4 cross alpha    # Fit generation as a fixed effect, 'cross alpha' is a class in SAS
    EFFECT
    1 cross alpha    # Fit animal effect
    RANDOM
    animal           # Fit animal effect (A matrix) for the effect directly above it (column 1, animal)
    FILE
    ped_f90.txt  #  pedigree file (animal, sire, dam), 0's are missing always!!!
    FILE_POS
    1 2 3 0 0        # indicates that column 1 = Animal, column 2 = Sire, column 3 = Dam
    SNP_FILE
    yM.txt
    (CO)VARIANCES    
    0.1               # add starting values for additive animal effect
    OPTION alpha_size 25            # Equal to the max number of characters within a column
    OPTION max_string_readline 800  # maximum number of characters in one line of data file
    OPTION max_field_readline 100   # maximum number of columns in the dataset
    #OPTION saveHinvOrig
    #OPTION saveHinv
    #OPTION sol se
    #OPTION use_yams
    #OPTION missing -999
    OPTION plotpca
    

    运行preGSf90后, 会生成pc1vspc2文件, 里面包括PC1和PC2两列, 增加世代为pop, 然后使用R画图:

    pca = read.table(("pc1vspc2"))
    head(pca)
    names(pca) = c("PC1","PC2")
    pca$pop = rep(c("A","B","C"),each=400)
    library(ggplot2)
    p <- ggplot(pca, aes(x=PC1, y=PC2, colour=pop)) 
    p <- p + geom_point(size=2)
    p <- p + stat_ellipse(level = 0.95, size = 1)
    p <- p + geom_hline(yintercept = 0) 
    p <- p + geom_vline(xintercept = 0) 
    p <- p + theme_bw()
    p
    
    image

    3. plink根据G矩阵做PCA

    代码:

    plink --file b --pca 3
    

    结果生成:

    plink.eigenval  plink.eigenvec  plink.log  plink.nosex
    

    R语言作图:

    library(ggplot2)
    head(dd)
    p <- ggplot(dd, aes(x=PC1, y=PC2, colour=pop)) 
    p <- p + geom_point(size=2)
    p <- p + stat_ellipse(level = 0.95, size = 1)
    # p <- p + scale_color_manual(values = cols)
    p <- p + geom_hline(yintercept = 0) 
    p <- p + geom_vline(xintercept = 0) 
    p <- p + theme_bw()
    p
    
    image

    4. gcta64根据G矩阵做PCA

    将ped文件转化为bed文件

    plink --file b --make-bed --out c
    

    生成grm文件

    gcta64 --bfile c --autosome --make-grm --out grm
    

    生成pca文件

    gcta64 --grm grm --pca 3
    

    根据PCA信息作图

    image

    结论

    blupf90的G矩阵, H矩阵, plink的PCA结果一致.
    GCTA构建的PCA结果不太一致, 怀疑是参数默认的有问题, 回头查看一下.

    相关文章

      网友评论

        本文标题:blupf90根据G矩阵和H矩阵构建PCA分析以及与Plink以

        本文链接:https://www.haomeiwen.com/subject/cgujnqtx.html