美文网首页孟德尔随机化
📘孟德尔随机化之Exposure data

📘孟德尔随机化之Exposure data

作者: 生信蟹道人 | 来源:发表于2023-10-26 12:22 被阅读0次

一、暴露因素的数据要求:

1.对于暴露因素的GWAS数据,TwoSampleMR需要一个工具变量数据构成的data frame,每行对应一个SNP,至少需要4列,分别为:

  • SNP – rs ID (chr和pos可转换为RS号)
  • beta – The effect size. If the trait is binary then log(OR) should be used
  • se – The standard error of the effect size
  • effect_allele – The allele of the SNP which has the effect marked in beta
  • other_allele – The non-effect allele (官网上不包含该列,但实际操作过程中缺少该列则无法运行)

beta与OR可以相互转化:beta=log(OR) 效应值 ;

P、Beta/OR、Se转换有公式,知道其中2个可以算出另外一个.

2.其他有助于MR预处理或分析的列包括:(eaf和样本量可用于计算F-stat和R2值

  • eaf – The effect allele frequency
  • Phenotype – The name of the phenotype for which the SNP has an effect

3.你也可以提供额外的信息

  • chr – Physical position of variant (chromosome)
  • position – Physical position of variant (position)
  • samplesize – Sample size for estimating the effect size****(可用于计算F-stat和R2值)
  • ncase – Number of cases (ncase和samplesize可用于计算power)
  • ncontrol – Number of controls
  • pval – The P-value for the SNP’s association with the exposure (P值筛选时有用)
  • units – The units in which the effects are presented
  • gene – The gene or other annotation for the the SNP

二、从现有数据库中获取工具变量:

  • 1.安装R包方便导入数据:
if (!requireNamespace("remotes", quietly = TRUE))install.packages("remotes")
if (!requireNamespace("MRInstruments", quietly = TRUE))remotes::install_github("MRCIEU/MRInstruments")
library(MRInstruments)
  • 2.GWAS catalog:
data(gwas_catalog)
head(gwas_catalog)
#例如,使用Speliotes等人2010年的研究获得BMI的工具变量:
bmi_gwas <-subset(gwas_catalog,grepl("Speliotes", Author) & Phenotype == "Body mass index")
bmi_exp_dat <- format_data(bmi_gwas)
  • 3.Metabolites:
data(metab_qtls)
head(metab_qtls)
#例如,要获得丙氨酸的工具变量:
ala_exp_dat <- format_metab_qtls(subset(metab_qtls, phenotype == "Ala"))
  • 4.Proteins:
data(proteomic_qtls)
head(proteomic_qtls)
#例如,为了获得ApoH蛋白的工具变量:
apoh_exp_dat <-
  format_proteomic_qtls(subset(proteomic_qtls, analyte == "ApoH"))
  • 5.Gene expression levels:
data(gtex_eqtl)
head(gtex_eqtl)
#例如,为了获得皮下脂肪组织中IRAK1BP1基因表达水平的工具变量:
irak1bp1_exp_dat <-
  format_gtex_eqtl(subset(
    gtex_eqtl,
    gene_name == "IRAK1BP1" & tissue == "Adipose Subcutaneous"
  ))
  • 6.DNA methylation levels:
data(aries_mqtl)
head(aries_mqtl)
#例如,为了获得出生时cg25212131 CpG DNA甲基化水平的工具变量:
cg25212131_exp_dat <-
  format_aries_mqtl(subset(aries_mqtl, cpg == "cg25212131" &
                             age == "Birth"))
  • 7.IEU GWAS database:
ao <- available_outcomes()
head(ao)                                   #查看数据前6行
head(subset(ao, select = c(trait, id)))  #该函数返回数据库中所有可用研究的表格。每个研究都有一个唯一的ID
#从Locke等人2015年GIANT研究中获取BMI相关SNPs,作为工具变量:
bmi2014_exp_dat <- extract_instruments(outcomes = 'ieu-a-2')

这里通过extract_instruments函数从IEU获取工具变量,需要了解一下参数:
p1 = P-value threshold for keeping a SNP
clump = Whether or not to return independent SNPs only (default is TRUE)
r2 = The maximum LD R-square allowed between returned SNPs
kb = The distance in which to search for LD R-square values
总结成一句话就是,我们通过设置p1参数找到与暴露因素具有显著相关的工具变量(default:p1 = 5e-08);然后通过设置clump参数去掉连锁不平衡(LD)的工具变量(The default is TRUE)(简单理解就是彼此工具变量相近了,研究起来没啥意义);然后我们通过设置p2,r2和kb参数来制定去除LD的标准(默认设置即可,也可按照参考文献设置参数)

相关文章

网友评论

    本文标题:📘孟德尔随机化之Exposure data

    本文链接:https://www.haomeiwen.com/subject/ltiyidtx.html