美文网首页生物信息学GWAS群体遗传学
2018-11-06 GWAS实战(七)plink 进阶之关联分

2018-11-06 GWAS实战(七)plink 进阶之关联分

作者: 小郑的学习笔记 | 来源:发表于2018-11-06 16:11 被阅读48次

    这一节说的是association analysis 也是我们分析单标记回归得到的结果(P值)的一步,这一步的结果可以用来与我第一讲 联合起来,形成一个闭环。跑跑标准流程。

    Association analysis 可以有很多用处,比如:

    The basic association test is for a disease trait and is based on comparing allele frequencies between cases and controls (asymptotic and empirical p-values are available). Also implemented are the Cochran-Armitage trend test, Fisher’s exact test, di↵erent genetic models (dominant, recessive and general), tests for stratified samples (e.g. Cochran-Mantel-Haenszel, Breslow-Day tests), a test for a quantitative trait; a test for dif- ferences in missing genotype rate between cases and controls; multilocus tests, using either Hotelling’s T(2) statistic or a sum-statistic approach (evaluated by permutation) as well as haplotype tests. The basic tests can be performed with permutation, described in the following section to provide empirical p-values, and allow for dierent designs (e.g. by use of structured, within-cluster permutation).

    我这了主要介绍一个Linear and logistic models

    These two features allow for multiple covariates when testing for both quantitative trait and disease trait SNP association, and for interactions with those covariates. The covariates can either be continuous or binary (i.e. for categorical covariates, you must first make a set of binary dummy variables).

    这个主要是可以加入协变量作为控制,很灵活,但是可能速度会慢一点

    说明上说最基础的用法是这样:


    basic

    但是这里我遇到一个问题
    我bed bim fam 是不包含表型数据的,所以我要自己重新定义一个表型文件

    我这里用quantitative traits作为例子

    一般来说,就是自己设置一个文件
    然后使用 --pheno 指定这个文件

    --pheno causes phenotype values to be read from the 3rd column of the specified space- or tab-delimited file, instead of the .fam or .ped file. The first and second columns of that file must contain family and within-family IDs, respectively.

    总共三列
    前两列是family and within-family IDs 第三列是表型

    我这里用第一主成分作为表型

    pheno

    咱们来试试看

    失败了,程序运行错误

    Warning: Skipping --linear since # variables >= # samples.
    
    error.png

    记住要加一个 --allow-no-sex

    --allow-no-sex is now required if you want to retain phenotype values for missing-sex samples. This is a change from PLINK 1.07; we believe it would be more confusing to continue treating regular and --pheno phenotypes differently, and apologize for any temporary inconvenience we've caused.

    程序就是这么设定的

    plink --bfile clean --linear --pheno clean_one.eigenvec --allow-no-sex
    
    done.png

    成功啦

    生成一个文件 assoc.linear

    assoc.png

    这个就可以用来画图啦

    画图回到一

    这里我解释一下每一列的意义吧

    1. 染色体
    2. snp 名字
    3. base-pair 物理位置
    4. Tested allele (minor allele by default)
    5. Code for the test 估计就是模型
    6. Number of non-missing individuals included in analysis 个体数目
    7. Regression coefficient (--linear) or odds ratio (--logistic) 也就是beta值(回归系数)
    8. Coefficient t-statistic (beta除以standard error, 越大越显著)
      9 Asymptotic p-value for t-statistic P值 看显著

    这个我就简单介绍到这里

    这里我还遇到一些实战的问题
    比如有些数据,我算出来极显著,P值等于0 ,这样后面画图 ylim不能为无穷大,会产生问题,还有就是我的图莫名其妙变瘦了,很奇怪。

    相关文章

      网友评论

        本文标题:2018-11-06 GWAS实战(七)plink 进阶之关联分

        本文链接:https://www.haomeiwen.com/subject/hqcntqtx.html