美文网首页GEO-芯片数据统计学学习
R|illumina芯片beads系列-lumi

R|illumina芯片beads系列-lumi

作者: 高大石头 | 来源:发表于2021-04-19 05:58 被阅读0次

    Illumina是常见的3大制造商之一,lumi包主要针对Illumina公司出品的bead系列表达谱和甲基化芯片进行质控、标准化,获得表达矩阵。它已经封装好lumiExpresso()函数,包含N、T、B、Q(normalization,transformation,background correction, quality control)四个步骤。

    LumiBatch对象是通过lumiR.batch()读取的被llumina Bead Studio toolkit 处理好的文件。

    下面以GSE40553为例

    Illumina BeadStudio (GenomeStudio)文件

    1.1 数据下载及读取

    gse = "GSE40553"
    #setwd(gse)
    library(lumi)
    library(GEOquery)
    #gunzip("GSE40553_Original_IlluinaFile_FromBeadStudio_UKpatients.txt.gz")
    x.lumi <- lumiR("GSE40553_Original_IlluinaFile_FromBeadStudio_UKpatients.txt")
    
    ## Perform Quality Control assessment of the LumiBatch object ...
    
    pd <- pData(phenoData(x.lumi))
    

    1.2 质量控制及标准化

    data.eset <- lumiExpresso(x.lumi) #质控
    
    ## Background Correction: bgAdjust 
    ## Variance Stabilizing Transform method: vst 
    ## Normalization method: quantile 
    ## 
    ## 
    ## Background correction ...
    ## Perform bgAdjust background correction ...
    ## The data has already been background adjusted!
    ## done.
    ## 
    ## Variance stabilizing ...
    ## Perform vst transformation ...
    ## 2021-04-13 18:20:28 , processing array  1 
    ## 2021-04-13 18:20:28 , processing array  2 
    ## 2021-04-13 18:20:28 , processing array  3 
    ## 2021-04-13 18:20:28 , processing array  4 
    ## 2021-04-13 18:20:28 , processing array  5 
    ## 2021-04-13 18:20:28 , processing array  6 
    ## 2021-04-13 18:20:28 , processing array  7 
    ## 2021-04-13 18:20:28 , processing array  8 
    ## 2021-04-13 18:20:28 , processing array  9 
    ## 2021-04-13 18:20:29 , processing array  10 
    ## 2021-04-13 18:20:29 , processing array  11 
    ## 2021-04-13 18:20:29 , processing array  12 
    ## 2021-04-13 18:20:29 , processing array  13 
    ## 2021-04-13 18:20:29 , processing array  14 
    ## 2021-04-13 18:20:29 , processing array  15 
    ## 2021-04-13 18:20:29 , processing array  16 
    ## 2021-04-13 18:20:29 , processing array  17 
    ## 2021-04-13 18:20:29 , processing array  18 
    ## 2021-04-13 18:20:29 , processing array  19 
    ## 2021-04-13 18:20:29 , processing array  20 
    ## 2021-04-13 18:20:29 , processing array  21 
    ## 2021-04-13 18:20:29 , processing array  22 
    ## 2021-04-13 18:20:30 , processing array  23 
    ## 2021-04-13 18:20:30 , processing array  24 
    ## 2021-04-13 18:20:30 , processing array  25 
    ## 2021-04-13 18:20:30 , processing array  26 
    ## 2021-04-13 18:20:30 , processing array  27 
    ## 2021-04-13 18:20:30 , processing array  28 
    ## 2021-04-13 18:20:30 , processing array  29 
    ## 2021-04-13 18:20:30 , processing array  30 
    ## 2021-04-13 18:20:30 , processing array  31 
    ## 2021-04-13 18:20:30 , processing array  32 
    ## 2021-04-13 18:20:31 , processing array  33 
    ## 2021-04-13 18:20:31 , processing array  34 
    ## 2021-04-13 18:20:31 , processing array  35 
    ## 2021-04-13 18:20:31 , processing array  36 
    ## 2021-04-13 18:20:31 , processing array  37 
    ## 2021-04-13 18:20:31 , processing array  38 
    ## 2021-04-13 18:20:31 , processing array  39 
    ## done.
    ## 
    ## Normalizing ...
    ## Perform quantile normalization ...
    ## done.
    ## 
    ## Quality control after preprocessing ...
    ## Perform Quality Control assessment of the LumiBatch object ...
    ## done.
    
    data.exprs <- exprs(data.eset)
    colors <- rainbow(ncol(data.exprs)*1.2)
    boxplot(data.exprs,col=colors,mian="expression value1",las=2)
    

    备注:lumiExpresso()进行背景校正,log2处理,quantile标准化。所有已经算是非常完备了。

    non-normalized data

    rm(list = ls())
    library(lumi)
    data.Nnorm <- data.table::fread("GSE40553_non-normalized_UKLong.txt",data.table = F) 
    head(data.Nnorm)
    
    ##         REF_ID 2083-2 Detection Pval 2055-5 Detection Pval 2083-4
    ## 1 ILMN_1802380  378.4        0.00000  365.6        0.00000  401.2
    ## 2 ILMN_1893287  -20.7        0.98571    0.5        0.48571   12.5
    ## 3 ILMN_3238331  -10.6        0.79091   -2.0        0.58052   -5.6
    ## 4 ILMN_1736104   -7.2        0.71818   12.6        0.06883   -3.4
    ## 5 ILMN_1792389   23.5        0.03247   32.4        0.00000   70.7
    ## 6 ILMN_1854015   28.3        0.01299   17.7        0.02468   30.5
    ##   Detection Pval 2030-5 Detection Pval 2065-1 Detection Pval 2074-2
    ## 1        0.00000  224.9        0.00000  448.0        0.00000  588.1
    ## 2        0.12468   13.4        0.03117    6.0        0.27403    7.2
    ## 3        0.68701  -10.9        0.95325  -15.4        0.91818  -17.0
    ## 4        0.60649   -1.4        0.56494    1.4        0.43377   14.2
    ## 5        0.00000   20.9        0.00000   12.1        0.14026   60.1
    ## 6        0.00519   17.4        0.00390   17.7        0.07532   22.7
    ##   Detection Pval 2074-5 Detection Pval 2009-4 Detection Pval 2065-2
    ## 1        0.00000  402.4        0.00000  397.7        0.00000  246.6
    ## 2        0.27403    7.6        0.25714   -3.2        0.62987    2.3
    ## 3        0.92078  -13.8        0.85455    0.0        0.47792   -8.8
    ## 4        0.12208   26.4        0.01818    6.6        0.21818    0.2
    ## 5        0.00000   45.3        0.00130   26.3        0.00519   23.8
    ## 6        0.04545   33.1        0.00649   20.4        0.01818   13.0
    ##   Detection Pval 2030-4 Detection Pval 2004-2 Detection Pval 2055-1
    ## 1        0.00000  300.0        0.00000  504.2        0.00000  470.3
    ## 2        0.38442  -16.0        0.93117    6.6        0.28701   12.3
    ## 3        0.83766   -1.3        0.52987   -5.5        0.63636   -6.9
    ## 4        0.47273  -10.0        0.76883    2.5        0.41429   -9.3
    ## 5        0.00260   82.9        0.00000   26.6        0.02727   68.9
    ## 6        0.06364   10.0        0.19091   24.8        0.02987   30.4
    ##   Detection Pval 2055-2 Detection Pval 2083-3 Detection Pval 2074-4
    ## 1        0.00000  375.2        0.00000  520.5        0.00000  260.6
    ## 2        0.18442   12.1        0.14935    4.1        0.36234    0.7
    ## 3        0.67662   -4.9        0.64545   -3.6        0.59740    7.5
    ## 4        0.72857   -4.2        0.62597   -3.7        0.59740   -7.9
    ## 5        0.00000   31.4        0.00519   33.4        0.00260   76.0
    ## 6        0.01429   32.5        0.00130    5.9        0.31558   18.0
    ##   Detection Pval 2074-1 Detection Pval 2074-3 Detection Pval 2009-1
    ## 1        0.00000  291.0        0.00000  296.4        0.00000  439.8
    ## 2        0.45844    5.5        0.24156    2.3        0.37662   -0.7
    ## 3        0.21688   -0.7        0.51818   -1.8        0.56883   -9.0
    ## 4        0.77273   -0.3        0.50390    0.3        0.46234   -7.9
    ## 5        0.00000   14.7        0.04026   13.5        0.05844   18.1
    ## 6        0.04286   23.9        0.00519   22.0        0.00649   12.7
    ##   Detection Pval 2083-5 Detection Pval 2009-5 Detection Pval 2098-1
    ## 1        0.00000  337.9        0.00000  483.9        0.00000  194.9
    ## 2        0.53117    8.7        0.22078   -8.0        0.73377    5.4
    ## 3        0.82857  -17.1        0.95844   -6.5        0.70000  -16.7
    ## 4        0.78312    5.3        0.33117    1.1        0.44675    6.6
    ## 5        0.01818   40.2        0.00000   25.0        0.02208   46.4
    ## 6        0.06623   17.4        0.05844   17.8        0.06623   26.1
    ##   Detection Pval 2098-3 Detection Pval 2083-1 Detection Pval 2004-5
    ## 1        0.00000  293.9        0.00000  417.6        0.00000  315.8
    ## 2        0.32597    7.7        0.34805   15.5        0.09351   -8.3
    ## 3        0.90519  -10.0        0.69091    2.3        0.41688  -20.2
    ## 4        0.29740  -10.1        0.69091    7.9        0.23247    0.1
    ## 5        0.00130   21.2        0.07662   28.2        0.01039   32.1
    ## 6        0.02208   22.7        0.06364   24.1        0.02468   27.4
    ##   Detection Pval 2065-3 Detection Pval 2065-5 Detection Pval 2004-1
    ## 1        0.00000  166.2        0.00000  265.9        0.00000  432.9
    ## 2        0.70779    2.7        0.35714   -0.6        0.50519    2.7
    ## 3        0.94286   -3.3        0.62597    3.1        0.39481   -9.1
    ## 4        0.49481    3.4        0.32857   17.3        0.06364    4.6
    ## 5        0.00779   33.1        0.00000   56.7        0.00000   13.5
    ## 6        0.02597   26.7        0.00260   10.7        0.15714    5.1
    ##   Detection Pval 2065-4 Detection Pval 2030-2 Detection Pval 2030-1
    ## 1        0.00000  239.8        0.00000  244.5        0.00000  298.1
    ## 2        0.34935   -3.5        0.57922   15.9        0.06753    6.4
    ## 3        0.85714   -5.2        0.62597   -9.2        0.80000   -5.0
    ## 4        0.29351   -2.5        0.55844    8.6        0.18571    5.4
    ## 5        0.06494   26.4        0.02078   31.3        0.00649   51.5
    ## 6        0.27792   16.1        0.11558   17.4        0.05455   21.1
    ##   Detection Pval 2098-5 Detection Pval 2009-3 Detection Pval 2009-2
    ## 1        0.00000  128.8        0.00000  410.7        0.00000  515.7
    ## 2        0.29351   12.9        0.12208  -12.0        0.86234    6.2
    ## 3        0.62727  -11.9        0.83377  -11.6        0.85325   -1.7
    ## 4        0.32468   -9.1        0.75325    8.6        0.21299  -12.4
    ## 5        0.00000   21.1        0.03247    6.3        0.26883   22.1
    ## 6        0.04416    7.8        0.23377   -3.2        0.57532   18.6
    ##   Detection Pval 2098-4 Detection Pval
    ## 1        0.00000  164.3        0.00000
    ## 2        0.28571    1.5        0.38961
    ## 3        0.54805   -0.4        0.47403
    ## 4        0.84156    0.9        0.41558
    ## 5        0.03766   22.9        0.00909
    ## 6        0.06883    8.8        0.16883
    

    2.1 读取数据

    library(limma)
    data.non <- read.ilmn("GSE40553_non-normalized_UKLong.txt",probeid = "REF_ID",other.columns = "Detection Pval",sep = "\t",expr = "20") #注意expr必须指明,否则报错
    
    ## Reading file GSE40553_non-normalized_UKLong.txt ... ...
    

    2.2 预处理

    data.exp <- neqc(data.non,detection.p = "Detection Pval") #质量控制
    dim(data.exp)
    
    ## [1] 47323    34
    
    data.exp1 <- data.exp$E
    colors <- rainbow(ncol(data.exp1)*1.2)
    boxplot(data.exp1,col=colors,las=3,mian="neqc-after")
    

    可见neqc()进行了背景校正、标准化和log转换,还是比较方便的,剩下的就是走正常的差异分析和探针注释。参见Q:Microarry数据标准化流程?

    2.3 探针过滤

    data.non$other$`Detection Pval`[1:4,1:4]
    
    ##                 83-2    55-5    83-4    30-5
    ## ILMN_1802380 0.00000 0.00000 0.00000 0.00000
    ## ILMN_1893287 0.98571 0.48571 0.12468 0.03117
    ## ILMN_3238331 0.79091 0.58052 0.68701 0.95325
    ## ILMN_1736104 0.71818 0.06883 0.60649 0.56494
    
    index <- rowSums(data.exp$other$`Detection Pval`<0.05)>=3
    table(index)
    
    ## index
    ## FALSE  TRUE 
    ## 23425 23898
    
    data.exp2 <- data.exp[index,]
    dim(data.exp2)
    
    ## [1] 23898    34
    

    参考链接:

    用lumi包来处理illumina的bead系列表达芯片

    lumi-对illumina的bead系列的表达谱和甲基化芯片标准化

    R语言_illumia芯片数据预处理分析
    illumina beadchip 芯片原始数据处理

    相关文章

      网友评论

        本文标题:R|illumina芯片beads系列-lumi

        本文链接:https://www.haomeiwen.com/subject/topclltx.html