一个不咋成功的3d

作者: Juan_NF | 来源:发表于2019-07-28 21:21 被阅读71次

    从芯片文件读取,RMA处理后,主要绘制两个图,pca3d和热图。

    文章来源:

    《Integrated Analysis of Copy Number Variation and Genome-Wide Expression Profiling in Colorectal Cancer Tissues》

    图片:
    image.png
    CEL数据读取
    • 本应该是很简单的事情,但我耽误了一天的时间用来装包。。。
    • 第一次进行芯片数据的处理,纠结了在oligo和affy中究竟选择哪个包;最终基于文章中的probe_id和读取后结果中的id的%in%结果,选择了oligo
    • 本希望直接读取chp文件,无奈没找到相应的解决方案;
    • 步骤为:
      下载数据(源于ArrayExpress,使用相应的R包下载即可)
      获取expression data和phenotype data
      RMA对expression data处理
    #########################数据下载#######################
    library(ArrayExpress)
    raw_data_dir <- tempdir()
    if (!dir.exists(raw_data_dir)) {
      dir.create(raw_data_dir)
    }
    anno_AE <- getAE("E-MEXP-3980", path = raw_data_dir, type = "raw")
    sdrf_location <- file.path(raw_data_dir, "E-MEXP-3980.sdrf.txt")
    SDRF <- read.delim(sdrf_location)
    rownames(SDRF) <- SDRF$Array.Data.File
    SDRF <- AnnotatedDataFrame(SDRF)
    raw_data <- oligo::read.celfiles(filenames = file.path(raw_data_dir, 
                                                           SDRF$Array.Data.File),
                                     verbose = FALSE, phenoData = SDRF)
    stopifnot(validObject(raw_data))
    ######################exprssion和phenodata#########################
    pd<- Biobase::pData(raw_data)
    exp_raw<- oligo::rma(raw_data, target = "core")
    exp_rma<- Biobase::exprs(exp_raw)
    save(raw_data,file='raw.Rdata')
    save(pd,file='pd.Rdata')
    save(exp_rma,file='rma.Rdata')
    
    PCA|heatmap
    • 批次效应-文章讲到去除了批次效应,BUT我并没有找到相应的批次信息;最后聚类的结果不理想(我就先认为是批次效应的原因吧);
    • 捡个便宜,差异基因结果直接在文章的附件中有,我就直接拿来用了,顺便依次确定了,应该是用oligo
    • 步骤为:
      boxplot(必需的check数据步骤)
      PCA-3dpca作图
      差异基因(直接参考了文章数据)-heatmap.2作图
    load('rma.Rdata')
    load('pd.Rdata')
    colnames(exp_rma)<- gsub('_.+','',rownames(matr))
    boxplot(exp_rma,outliers=F,las=2)
    library(FactoMineR)
    PCA_raw <- PCA(t(exp_rma),graph = F)
    ####热图
    ##################choose gene#################
    library(openxlsx)
    a<- read.xlsx('./cel/pone.0092553.s002.xlsx',sheet=1,startRow=3)
    a<- a[,c(1,15,17)]
    colnames(a) <- c('probe_id','p_value','fold_change')
    exp_rma<- exp_rma[rownames(exp_rma)%in%a$probe_id,]
    matr<- scale(t(exp_rma))
    matr[matr< -2] <- -2
    matr[matr>2] <- 2
    library(gplots)
    ####
    rownames(matr)<- gsub('_.+','',rownames(matr))
    pd$group<- gsub('_.+','',pd$Array.Data.File)
    group<- as.character(pd$Characteristics.disease.[match(rownames(matr),pd$group)])
    gr <-ifelse(group=='Normal','red','blue')
    ####
    library('scatterplot3d')
    png(file='3d.png')
    scatterplot3d(PCA_raw$ind$coord[,c(3,1,2)],
                  main=paste0('PCA mapping','(',round(sum(PCA_raw$eig[,2][1:3]),2),')'),
                  pch = 20, 
                  color=gr,
                  cex.symbols = 1.5,
                  xlab = 'PC3',
                  ylab = 'PC1',
                  zlab = 'PC2')
    par(lend = 1)           # square line ends for the color legend
    legend(-2.5,6.5,
           title = 'sample',
           legend=c('Normal', 'Tumor'),
           col=c('red','blue'),
           pch=15)
    dev.off()
    ####
    png(file="myplot.png")
    heatmap.2(matr,
              lmat=rbind( c(0,0,4), c(3,1,2),c(0,5,5) ), lhei=c(1,4,1.2),
              lwid=c(1.5,0.2,4.0),
              key = TRUE,
              key.xlab = '',
              key.ylab = '',
              keysize = 1.5,
              RowSideColors = gr,
              Colv=T,
              dendrogram = 'both',
              key.title = '',
              trace = 'none',
              scale = 'none', 
              srtCol = 30, offsetCol = -0.5,
              col=redgreen,
              labRow = '',
              labCol = '') 
    par(lend = 1)           # square line ends for the color legend
    legend(0.3,0.1,
           legend=c('Normal', 'Tumor'),
           col=c('red','blue'),
           horiz=T,
           pch=15)
    dev.off()
    
    box
    heatmap.2
    PCA3d
    [参考资料]
    1.An end to end workflow for differential gene expression using Affymetrix microarrays
    2.Amazing interactive 3D scatter plots - R software and data visualization

    相关文章

      网友评论

        本文标题:一个不咋成功的3d

        本文链接:https://www.haomeiwen.com/subject/pbcerctx.html