一、暴露因素的数据要求：

1.对于暴露因素的GWAS数据，TwoSampleMR需要一个工具变量数据构成的data frame，每行对应一个SNP，至少需要4列，分别为：

SNP – rs ID （chr和pos可转换为RS号）
beta – The effect size. If the trait is binary then log(OR) should be used
se – The standard error of the effect size
effect_allele – The allele of the SNP which has the effect marked in beta
other_allele – The non-effect allele （官网上不包含该列，但实际操作过程中缺少该列则无法运行）

beta与OR可以相互转化：beta=log(OR) 效应值 ;

P、Beta/OR、Se转换有公式，知道其中2个可以算出另外一个.

2.其他有助于MR预处理或分析的列包括：（eaf和样本量可用于计算F-stat和R2值）

eaf – The effect allele frequency
Phenotype – The name of the phenotype for which the SNP has an effect

3.你也可以提供额外的信息

chr – Physical position of variant (chromosome)
position – Physical position of variant (position)
samplesize – Sample size for estimating the effect size****（可用于计算F-stat和R2值）
ncase – Number of cases （ncase和samplesize可用于计算power）
ncontrol – Number of controls
pval – The P-value for the SNP’s association with the exposure (P值筛选时有用)
units – The units in which the effects are presented
gene – The gene or other annotation for the the SNP

二、从现有数据库中获取工具变量：

1.安装R包方便导入数据：

if (!requireNamespace("remotes", quietly = TRUE))install.packages("remotes")
if (!requireNamespace("MRInstruments", quietly = TRUE))remotes::install_github("MRCIEU/MRInstruments")
library(MRInstruments)

2.GWAS catalog：

data(gwas_catalog)
head(gwas_catalog)
#例如，使用Speliotes等人2010年的研究获得BMI的工具变量:
bmi_gwas <-subset(gwas_catalog,grepl("Speliotes", Author) & Phenotype == "Body mass index")
bmi_exp_dat <- format_data(bmi_gwas)

3.Metabolites：

data(metab_qtls)
head(metab_qtls)
#例如，要获得丙氨酸的工具变量:
ala_exp_dat <- format_metab_qtls(subset(metab_qtls, phenotype == "Ala"))

4.Proteins：

data(proteomic_qtls)
head(proteomic_qtls)
#例如，为了获得ApoH蛋白的工具变量:
apoh_exp_dat <-
  format_proteomic_qtls(subset(proteomic_qtls, analyte == "ApoH"))

5.Gene expression levels：

data(gtex_eqtl)
head(gtex_eqtl)
#例如，为了获得皮下脂肪组织中IRAK1BP1基因表达水平的工具变量:
irak1bp1_exp_dat <-
  format_gtex_eqtl(subset(
    gtex_eqtl,
    gene_name == "IRAK1BP1" & tissue == "Adipose Subcutaneous"
  ))

6.DNA methylation levels：

data(aries_mqtl)
head(aries_mqtl)
#例如，为了获得出生时cg25212131 CpG DNA甲基化水平的工具变量：
cg25212131_exp_dat <-
  format_aries_mqtl(subset(aries_mqtl, cpg == "cg25212131" &
                             age == "Birth"))

7.IEU GWAS database：

ao <- available_outcomes()
head(ao)                                   #查看数据前6行
head(subset(ao, select = c(trait, id)))  #该函数返回数据库中所有可用研究的表格。每个研究都有一个唯一的ID
#从Locke等人2015年GIANT研究中获取BMI相关SNPs，作为工具变量：
bmi2014_exp_dat <- extract_instruments(outcomes = 'ieu-a-2')

这里通过extract_instruments函数从IEU获取工具变量，需要了解一下参数：
● p1 = P-value threshold for keeping a SNP
● clump = Whether or not to return independent SNPs only (default is TRUE)
● r2 = The maximum LD R-square allowed between returned SNPs
● kb = The distance in which to search for LD R-square values
总结成一句话就是，我们通过设置p1参数找到与暴露因素具有显著相关的工具变量（default：p1 = 5e-08）；然后通过设置clump参数去掉连锁不平衡（LD）的工具变量（The default is TRUE）（简单理解就是彼此工具变量相近了，研究起来没啥意义）；然后我们通过设置p2，r2和kb参数来制定去除LD的标准（默认设置即可，也可按照参考文献设置参数）