Illumina是常见的3大制造商之一,lumi包主要针对Illumina公司出品的bead系列表达谱和甲基化芯片进行质控、标准化,获得表达矩阵。它已经封装好lumiExpresso()
函数,包含N、T、B、Q(normalization,transformation,background correction, quality control)四个步骤。
LumiBatch对象是通过lumiR.batch()读取的被llumina Bead Studio toolkit 处理好的文件。
下面以GSE40553为例
Illumina BeadStudio (GenomeStudio)文件
1.1 数据下载及读取
gse = "GSE40553"
#setwd(gse)
library(lumi)
library(GEOquery)
#gunzip("GSE40553_Original_IlluinaFile_FromBeadStudio_UKpatients.txt.gz")
x.lumi <- lumiR("GSE40553_Original_IlluinaFile_FromBeadStudio_UKpatients.txt")
## Perform Quality Control assessment of the LumiBatch object ...
pd <- pData(phenoData(x.lumi))
1.2 质量控制及标准化
data.eset <- lumiExpresso(x.lumi) #质控
## Background Correction: bgAdjust
## Variance Stabilizing Transform method: vst
## Normalization method: quantile
##
##
## Background correction ...
## Perform bgAdjust background correction ...
## The data has already been background adjusted!
## done.
##
## Variance stabilizing ...
## Perform vst transformation ...
## 2021-04-13 18:20:28 , processing array 1
## 2021-04-13 18:20:28 , processing array 2
## 2021-04-13 18:20:28 , processing array 3
## 2021-04-13 18:20:28 , processing array 4
## 2021-04-13 18:20:28 , processing array 5
## 2021-04-13 18:20:28 , processing array 6
## 2021-04-13 18:20:28 , processing array 7
## 2021-04-13 18:20:28 , processing array 8
## 2021-04-13 18:20:28 , processing array 9
## 2021-04-13 18:20:29 , processing array 10
## 2021-04-13 18:20:29 , processing array 11
## 2021-04-13 18:20:29 , processing array 12
## 2021-04-13 18:20:29 , processing array 13
## 2021-04-13 18:20:29 , processing array 14
## 2021-04-13 18:20:29 , processing array 15
## 2021-04-13 18:20:29 , processing array 16
## 2021-04-13 18:20:29 , processing array 17
## 2021-04-13 18:20:29 , processing array 18
## 2021-04-13 18:20:29 , processing array 19
## 2021-04-13 18:20:29 , processing array 20
## 2021-04-13 18:20:29 , processing array 21
## 2021-04-13 18:20:29 , processing array 22
## 2021-04-13 18:20:30 , processing array 23
## 2021-04-13 18:20:30 , processing array 24
## 2021-04-13 18:20:30 , processing array 25
## 2021-04-13 18:20:30 , processing array 26
## 2021-04-13 18:20:30 , processing array 27
## 2021-04-13 18:20:30 , processing array 28
## 2021-04-13 18:20:30 , processing array 29
## 2021-04-13 18:20:30 , processing array 30
## 2021-04-13 18:20:30 , processing array 31
## 2021-04-13 18:20:30 , processing array 32
## 2021-04-13 18:20:31 , processing array 33
## 2021-04-13 18:20:31 , processing array 34
## 2021-04-13 18:20:31 , processing array 35
## 2021-04-13 18:20:31 , processing array 36
## 2021-04-13 18:20:31 , processing array 37
## 2021-04-13 18:20:31 , processing array 38
## 2021-04-13 18:20:31 , processing array 39
## done.
##
## Normalizing ...
## Perform quantile normalization ...
## done.
##
## Quality control after preprocessing ...
## Perform Quality Control assessment of the LumiBatch object ...
## done.
data.exprs <- exprs(data.eset)
colors <- rainbow(ncol(data.exprs)*1.2)
boxplot(data.exprs,col=colors,mian="expression value1",las=2)
备注:lumiExpresso()进行背景校正,log2处理,quantile标准化。所有已经算是非常完备了。
non-normalized data
rm(list = ls())
library(lumi)
data.Nnorm <- data.table::fread("GSE40553_non-normalized_UKLong.txt",data.table = F)
head(data.Nnorm)
## REF_ID 2083-2 Detection Pval 2055-5 Detection Pval 2083-4
## 1 ILMN_1802380 378.4 0.00000 365.6 0.00000 401.2
## 2 ILMN_1893287 -20.7 0.98571 0.5 0.48571 12.5
## 3 ILMN_3238331 -10.6 0.79091 -2.0 0.58052 -5.6
## 4 ILMN_1736104 -7.2 0.71818 12.6 0.06883 -3.4
## 5 ILMN_1792389 23.5 0.03247 32.4 0.00000 70.7
## 6 ILMN_1854015 28.3 0.01299 17.7 0.02468 30.5
## Detection Pval 2030-5 Detection Pval 2065-1 Detection Pval 2074-2
## 1 0.00000 224.9 0.00000 448.0 0.00000 588.1
## 2 0.12468 13.4 0.03117 6.0 0.27403 7.2
## 3 0.68701 -10.9 0.95325 -15.4 0.91818 -17.0
## 4 0.60649 -1.4 0.56494 1.4 0.43377 14.2
## 5 0.00000 20.9 0.00000 12.1 0.14026 60.1
## 6 0.00519 17.4 0.00390 17.7 0.07532 22.7
## Detection Pval 2074-5 Detection Pval 2009-4 Detection Pval 2065-2
## 1 0.00000 402.4 0.00000 397.7 0.00000 246.6
## 2 0.27403 7.6 0.25714 -3.2 0.62987 2.3
## 3 0.92078 -13.8 0.85455 0.0 0.47792 -8.8
## 4 0.12208 26.4 0.01818 6.6 0.21818 0.2
## 5 0.00000 45.3 0.00130 26.3 0.00519 23.8
## 6 0.04545 33.1 0.00649 20.4 0.01818 13.0
## Detection Pval 2030-4 Detection Pval 2004-2 Detection Pval 2055-1
## 1 0.00000 300.0 0.00000 504.2 0.00000 470.3
## 2 0.38442 -16.0 0.93117 6.6 0.28701 12.3
## 3 0.83766 -1.3 0.52987 -5.5 0.63636 -6.9
## 4 0.47273 -10.0 0.76883 2.5 0.41429 -9.3
## 5 0.00260 82.9 0.00000 26.6 0.02727 68.9
## 6 0.06364 10.0 0.19091 24.8 0.02987 30.4
## Detection Pval 2055-2 Detection Pval 2083-3 Detection Pval 2074-4
## 1 0.00000 375.2 0.00000 520.5 0.00000 260.6
## 2 0.18442 12.1 0.14935 4.1 0.36234 0.7
## 3 0.67662 -4.9 0.64545 -3.6 0.59740 7.5
## 4 0.72857 -4.2 0.62597 -3.7 0.59740 -7.9
## 5 0.00000 31.4 0.00519 33.4 0.00260 76.0
## 6 0.01429 32.5 0.00130 5.9 0.31558 18.0
## Detection Pval 2074-1 Detection Pval 2074-3 Detection Pval 2009-1
## 1 0.00000 291.0 0.00000 296.4 0.00000 439.8
## 2 0.45844 5.5 0.24156 2.3 0.37662 -0.7
## 3 0.21688 -0.7 0.51818 -1.8 0.56883 -9.0
## 4 0.77273 -0.3 0.50390 0.3 0.46234 -7.9
## 5 0.00000 14.7 0.04026 13.5 0.05844 18.1
## 6 0.04286 23.9 0.00519 22.0 0.00649 12.7
## Detection Pval 2083-5 Detection Pval 2009-5 Detection Pval 2098-1
## 1 0.00000 337.9 0.00000 483.9 0.00000 194.9
## 2 0.53117 8.7 0.22078 -8.0 0.73377 5.4
## 3 0.82857 -17.1 0.95844 -6.5 0.70000 -16.7
## 4 0.78312 5.3 0.33117 1.1 0.44675 6.6
## 5 0.01818 40.2 0.00000 25.0 0.02208 46.4
## 6 0.06623 17.4 0.05844 17.8 0.06623 26.1
## Detection Pval 2098-3 Detection Pval 2083-1 Detection Pval 2004-5
## 1 0.00000 293.9 0.00000 417.6 0.00000 315.8
## 2 0.32597 7.7 0.34805 15.5 0.09351 -8.3
## 3 0.90519 -10.0 0.69091 2.3 0.41688 -20.2
## 4 0.29740 -10.1 0.69091 7.9 0.23247 0.1
## 5 0.00130 21.2 0.07662 28.2 0.01039 32.1
## 6 0.02208 22.7 0.06364 24.1 0.02468 27.4
## Detection Pval 2065-3 Detection Pval 2065-5 Detection Pval 2004-1
## 1 0.00000 166.2 0.00000 265.9 0.00000 432.9
## 2 0.70779 2.7 0.35714 -0.6 0.50519 2.7
## 3 0.94286 -3.3 0.62597 3.1 0.39481 -9.1
## 4 0.49481 3.4 0.32857 17.3 0.06364 4.6
## 5 0.00779 33.1 0.00000 56.7 0.00000 13.5
## 6 0.02597 26.7 0.00260 10.7 0.15714 5.1
## Detection Pval 2065-4 Detection Pval 2030-2 Detection Pval 2030-1
## 1 0.00000 239.8 0.00000 244.5 0.00000 298.1
## 2 0.34935 -3.5 0.57922 15.9 0.06753 6.4
## 3 0.85714 -5.2 0.62597 -9.2 0.80000 -5.0
## 4 0.29351 -2.5 0.55844 8.6 0.18571 5.4
## 5 0.06494 26.4 0.02078 31.3 0.00649 51.5
## 6 0.27792 16.1 0.11558 17.4 0.05455 21.1
## Detection Pval 2098-5 Detection Pval 2009-3 Detection Pval 2009-2
## 1 0.00000 128.8 0.00000 410.7 0.00000 515.7
## 2 0.29351 12.9 0.12208 -12.0 0.86234 6.2
## 3 0.62727 -11.9 0.83377 -11.6 0.85325 -1.7
## 4 0.32468 -9.1 0.75325 8.6 0.21299 -12.4
## 5 0.00000 21.1 0.03247 6.3 0.26883 22.1
## 6 0.04416 7.8 0.23377 -3.2 0.57532 18.6
## Detection Pval 2098-4 Detection Pval
## 1 0.00000 164.3 0.00000
## 2 0.28571 1.5 0.38961
## 3 0.54805 -0.4 0.47403
## 4 0.84156 0.9 0.41558
## 5 0.03766 22.9 0.00909
## 6 0.06883 8.8 0.16883
2.1 读取数据
library(limma)
data.non <- read.ilmn("GSE40553_non-normalized_UKLong.txt",probeid = "REF_ID",other.columns = "Detection Pval",sep = "\t",expr = "20") #注意expr必须指明,否则报错
## Reading file GSE40553_non-normalized_UKLong.txt ... ...
2.2 预处理
data.exp <- neqc(data.non,detection.p = "Detection Pval") #质量控制
dim(data.exp)
## [1] 47323 34
data.exp1 <- data.exp$E
colors <- rainbow(ncol(data.exp1)*1.2)
boxplot(data.exp1,col=colors,las=3,mian="neqc-after")
可见neqc()
进行了背景校正、标准化和log转换,还是比较方便的,剩下的就是走正常的差异分析和探针注释。参见Q:Microarry数据标准化流程?
2.3 探针过滤
data.non$other$`Detection Pval`[1:4,1:4]
## 83-2 55-5 83-4 30-5
## ILMN_1802380 0.00000 0.00000 0.00000 0.00000
## ILMN_1893287 0.98571 0.48571 0.12468 0.03117
## ILMN_3238331 0.79091 0.58052 0.68701 0.95325
## ILMN_1736104 0.71818 0.06883 0.60649 0.56494
index <- rowSums(data.exp$other$`Detection Pval`<0.05)>=3
table(index)
## index
## FALSE TRUE
## 23425 23898
data.exp2 <- data.exp[index,]
dim(data.exp2)
## [1] 23898 34
参考链接:
网友评论