MOVICS系列教程(一) GET Module

作者: 生信宝库 | 来源:发表于2022-01-19 21:25 被阅读0次

    前言

    Immugent在之前的推文:整合多组学数据进行分型之MOVICS中已经介绍了MOVICS的基本功能,从本篇推文开始,小编将会以一系列推文的形式对这个R包进行实操演示。

    为了方便有兴趣的小伙伴进行复现,此系列推文都将以MOVICS包内置数据进行演示,大家可以直接将代码复制黏贴进Rstudio中,点点点即可!


    主要函数

    GET Module是MOVICS的第一个模块,主要功能是结合多组学数据对样本进行分亚型。下面是这个模块主要用到的函数,Immugent考虑到自己英文水平有限,怕译本走了味就没有对其进行翻译。

    1.getElites(): get elites which are those features that pass the filtering procedure and are used for analyses

    2.getClustNum(): get optimal cluster number by calculating clustering prediction index (CPI) and Gap-statistics get%algorithm_name%(): get results from one specific multi-omics integrative clustering algorithm with detailed parameters

    3.getMOIC(): get a list of results from multiple multi-omics integrative clustering algorithm with parameters by default getConsensusMOIC(): get a consensus matrix that indicates the clustering robustness across different clustering algorithms and generate a consensus heatmap

    4.getSilhouette(): get quantification of sample similarity using silhoutte score approach

    5。getStdiz(): get a standardized data for generating comprehensive multi-omics heatmap

    6.getMoHeatmap(): get a comprehensive multi-omics heatmap based on clustering results

    其中的每一个函数都自带绘图功能,而且可以通过调整多项参数达到个性化分析的目的,最后一个是专门对亚型分子特征进行展示的热图,配色高达上可直接放进文章中。


    主要流程

    下面开始这个模块的代码展示:

    #安装包#安装包
    if (!requireNamespace("BiocManager", quietly = TRUE))
        install.packages("BiocManager")
    if (!require("devtools")) 
        install.packages("devtools")
    devtools::install_github("xlucpu/MOVICS")
    
    #加载数据
    library("MOVICS")
    # load example data of breast cancer
    load(system.file("extdata", "brca.tcga.RData", package = "MOVICS", mustWork = TRUE))
    load(system.file("extdata", "brca.yau.RData",  package = "MOVICS", mustWork = TRUE))
    
    
    # print name of example data
    names(brca.tcga)
    #> [1] "mRNA.expr"   "lncRNA.expr" "meth.beta"   "mut.status"  "count"      
    #> [6] "fpkm"        "maf"         "segment"     "clin.info"
    names(brca.yau)
    #> [1] "mRNA.expr" "clin.info"
    
    # extract multi-omics data
    mo.data   <- brca.tcga[1:4]
    
    # extract raw count data for downstream analyses
    count     <- brca.tcga$count
    
    # extract fpkm data for downstream analyses
    fpkm      <- brca.tcga$fpkm
    
    # extract maf for downstream analysis
    maf       <- brca.tcga$maf
    
    # extract segmented copy number for downstream analyses
    segment   <- brca.tcga$segment
    
    # extract survival information
    surv.info <- brca.tcga$clin.info
    
    # identify optimal clustering number (may take a while)
    optk.brca <- getClustNum(data        = mo.data,
                             is.binary   = c(F,F,F,T), # note: the 4th data is somatic mutation which is a binary matrix
                             try.N.clust = 2:8, # try cluster number from 2 to 8
                             fig.name    = "CLUSTER NUMBER OF TCGA-BRCA")
    
    图片
    # perform iClusterBayes (may take a while)
    iClusterBayes.res <- getiClusterBayes(data        = mo.data,
                                          N.clust     = 5,
                                          type        = c("gaussian","gaussian","gaussian","binomial"),
                                          n.burnin    = 1800,
                                          n.draw      = 1200,
                                          prior.gamma = c(0.5, 0.5, 0.5, 0.5),
                                          sdev        = 0.05,
                                          thin        = 3)
    

    为了和PAM50保持一致,和从图中观察所知,取5个亚群较为合适。

    iClusterBayes.res <- getMOIC(data        = mo.data,
                                 N.clust     = 5,
                                 methodslist = "iClusterBayes", # specify only ONE algorithm here
                                 type        = c("gaussian","gaussian","gaussian","binomial"), # data type corresponding to the list
                                 n.burnin    = 1800,
                                 n.draw      = 1200,
                                 prior.gamma = c(0.5, 0.5, 0.5, 0.5),
                                 sdev        = 0.05,
                                 thin        = 3)
    
    cmoic.brca <- getConsensusMOIC(moic.res.list = moic.res.list,
                                   fig.name      = "CONSENSUS HEATMAP",
                                   distance      = "euclidean",
                                   linkage       = "average")
    
    图片
    getSilhouette(sil      = cmoic.brca$sil, # a sil object returned by getConsensusMOIC()
                  fig.path = getwd(),
                  fig.name = "SILHOUETTE",
                  height   = 5.5,
                  width    = 5)
    
    图片

    还可以画个热图。

    # convert beta value to M value for stronger signal
    indata <- mo.data
    indata$meth.beta <- log2(indata$meth.beta / (1 - indata$meth.beta))
    
    # data normalization for heatmap
    plotdata <- getStdiz(data       = indata,
                         halfwidth  = c(2,2,2,NA), # no truncation for mutation
                         centerFlag = c(T,T,T,F), # no center for mutation
                         scaleFlag  = c(T,T,T,F)) # no scale for mutation
                         
    feat   <- iClusterBayes.res$feat.res
    feat1  <- feat[which(feat$dataset == "mRNA.expr"),][1:10,"feature"] 
    feat2  <- feat[which(feat$dataset == "lncRNA.expr"),][1:10,"feature"]
    feat3  <- feat[which(feat$dataset == "meth.beta"),][1:10,"feature"]
    feat4  <- feat[which(feat$dataset == "mut.status"),][1:10,"feature"]
    annRow <- list(feat1, feat2, feat3, feat4)
    
    # set color for each omics data
    # if no color list specified all subheatmaps will be unified to green and red color pattern
    mRNA.col   <- c("#00FF00", "#008000", "#000000", "#800000", "#FF0000")
    lncRNA.col <- c("#6699CC", "white"  , "#FF3C38")
    meth.col   <- c("#0074FE", "#96EBF9", "#FEE900", "#F00003")
    mut.col    <- c("grey90" , "black")
    col.list   <- list(mRNA.col, lncRNA.col, meth.col, mut.col)
    
    # comprehensive heatmap (may take a while)
    getMoHeatmap(data          = plotdata,
                 row.title     = c("mRNA","lncRNA","Methylation","Mutation"),
                 is.binary     = c(F,F,F,T), # the 4th data is mutation which is binary
                 legend.name   = c("mRNA.FPKM","lncRNA.FPKM","M value","Mutated"),
                 clust.res     = iClusterBayes.res$clust.res, # cluster results
                 clust.dend    = NULL, # no dendrogram
                 show.rownames = c(F,F,F,F), # specify for each omics data
                 show.colnames = FALSE, # show no sample names
                 annRow        = annRow, # mark selected features
                 color         = col.list,
                 annCol        = NULL, # no annotation for samples
                 annColors     = NULL, # no annotation color
                 width         = 10, # width of each subheatmap
                 height        = 5, # height of each subheatmap
                 fig.name      = "COMPREHENSIVE HEATMAP OF ICLUSTERBAYES")
    
    图片

    这排版,这配色,可以直接放在文章中使用。


    总结

    MOVICS第一个模块就是对多组学数据进行整合,通过联合多种统计算法揭示它们之间的关联,并总结出各组学的特征对样本进行分亚型。

    在这个模块中,你必须给MOVICS提供至少两个组学的数据,而且组学之间是独立的,要具体根据研究的科学问题来输入正确的数据。分完亚型后,我们就需要揭示各亚群之间的分子特征差异,Immugent将会在下一个推文中进行讲解。

    相关文章

      网友评论

        本文标题:MOVICS系列教程(一) GET Module

        本文链接:https://www.haomeiwen.com/subject/ijldhrtx.html