美文网首页
差分网络分析DiffCoEx

差分网络分析DiffCoEx

作者: 小潤澤 | 来源:发表于2022-01-06 20:04 被阅读0次

    简介

    WGCNA网络分析以后,我们可以对一些sample构建出共表达的网络,那么这里有一个问题,那就是不同处理下的sample(比方说treat和control),它们的共表达网络是一样的吗?如果不一样,那些产生差异的gene pair是否与这些处理有关系,是否是一种比较重要的核心基因呢?

    本篇推文老药新用,这是一篇2010年发表的文章:《DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules》
    这篇文章基于传统的WGCNA构建的网络,来寻找两种处理的差异共表达网络

    数学原理

    构建差分网络大致分为5个步骤:
    step 1:
    建立邻接矩阵C,邻接矩阵是由两个gene pair的相关性来表征的,即两个基因之间的权重

    step 2:
    计算gene pair在两个组别权重的变化程度,并构建邻接变化矩阵 D;处理组为 [1],对照组为 [2]

    其中:

    1. C[1]ij 代表处理组的邻接矩阵;C[2]ij 代表对照组的邻接矩阵
    2. β 为WGCNA的计算邻接矩阵的指数 β
    3. sign() 为符号函数

    如果 dij 越高,代表gene i 与 gene j 的权重(共表达状态)在两个组别中的改变是显著的

    step 3:
    由邻接变化矩阵 D 计算相异矩阵 T


    其中,k 代表与 gene i 和 gene j 不同的 gene k
    那么相异矩阵 T 考虑了第三方gene k所带来的影响,那么tij 越低,代表gene i 与 gene j 的权重(共表达状态)在两个组别中的改变是显著的

    理解:假设C[1]ij很高,C[2]ij很低,则 dij 很高,代表gene i 与 gene j 的权重(共表达状态)越低,此时,tij 值就越低

    step 4:
    进行树聚类

    step 5:
    显著性统计

    代码:

    作者以一套RNA探针表达谱的数据为例子:

    #required libraries
    #WGCNA package can be found at
    #http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA
    library(WGCNA)          ###used for topological overlap calculation and clustering steps
    library(RColorBrewer)   ###used to create nicer colour palettes
    library(preprocessCore) ###used by the quantile normalization function
    library(flashClust)
    
    #Note: the data can be downloaded from the Gene Expression Omnibus
    # http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2901
    data<-as.matrix(read.csv(file="GDS2901.soft",skip=166,row.names=1,sep="\t",header=T))
    data<-data[-15924,]
    rawData<-matrix(as.numeric(data[,-1]),nrow=15923)
    dimnames(rawData)<-dimnames(data[,-1])
    #we create an annotation matrix containing the matches between probesets and gene names
    anno<-as.matrix(data[-2475,1]) 
    normData<-normalize.quantiles(log2(rawData))
    dimnames(normData)<-dimnames(rawData)
    
    #we remove the probeset at index 2475 because
    #after quantile normalization it has zero variance
    #(the probeset has the highest signal of all samples)
    normData<-normData[-2475,]  
    # 定义组别 1 的邻接矩阵
    datC1<-t(normData[,c(1:12,25:36,37:48)]) ### these samples correspond to the Eker mutants.
    # Note that since the Eker mutants have two sets of 12 control samples (13:24 and 37:48)
    # we discard one to have a symmetric perturbation (carcinogenic vs control) between the two conditions (Eker mutants vs wild-types)
    # 定义组别 2 的邻接矩阵
    datC2<-t(normData[,49:84]) ###those samples correspond to the wild-types
    
    # 定义 β 值
    beta1=6 #user defined parameter for soft thresholding
    # 可参考step 2数学原理
    AdjMatC1<-sign(cor(datC1,method="spearman"))*(cor(datC1,method="spearman"))^2
    AdjMatC2<-sign(cor(datC2,method="spearman"))*(cor(datC2,method="spearman"))^2
    diag(AdjMatC1)<-0
    diag(AdjMatC2)<-0
    collectGarbage()
    # 计算相异矩阵 T
    dissTOMC1C2=TOMdist((abs(AdjMatC1-AdjMatC2)/2)^(beta1/2))
    collectGarbage()
    
    # 对相异矩阵 T 进行聚类
    #Hierarchical clustering is performed using the Topological Overlap of the adjacency difference as input distance matrix
    geneTreeC1C2 = flashClust(as.dist(dissTOMC1C2), method = "average");
    
    # Plot the resulting clustering tree (dendrogram)
    png(file="hierarchicalTree.png",height=1000,width=1000)
    plot(geneTreeC1C2, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",labels = FALSE, hang = 0.04);
    dev.off()
    
    #We now extract modules from the hierarchical tree. This is done using cutreeDynamic. Please refer to WGCNA package documentation for details
    dynamicModsHybridC1C2 = cutreeDynamic(dendro = geneTreeC1C2, distM = dissTOMC1C2,method="hybrid",cutHeight=.996,deepSplit = T, pamRespectsDendro = FALSE,minClusterSize = 20);
    
    #Every module is assigned a color. Note that GREY is reserved for genes which do not belong to any differentially coexpressed module
    dynamicColorsHybridC1C2 = labels2colors(dynamicModsHybridC1C2)
    
    #the next step merges clusters which are close (see WGCNA package documentation)
    mergedColorC1C2<-mergeCloseModules(rbind(datC1,datC2),dynamicColorsHybridC1C2,cutHeight=.2)$color
    colorh1C1C2<-mergedColorC1C2
    
    #reassign better colors
    colorh1C1C2[which(colorh1C1C2 =="midnightblue")]<-"red"
    colorh1C1C2[which(colorh1C1C2 =="lightgreen")]<-"yellow"
    
    colorh1C1C2[which(colorh1C1C2 =="cyan")]<-"orange"
    colorh1C1C2[which(colorh1C1C2 =="lightcyan")]<-"green"
    # Plot the dendrogram and colors underneath
    png(file="module_assignment.png",width=1000,height=1000)
    plotDendroAndColors(geneTreeC1C2, colorh1C1C2, "Hybrid Tree Cut",dendroLabels = FALSE, hang = 0.03,addGuide = TRUE, guideHang = 0.05,main = "Gene dendrogram and module colors cells")
    dev.off()
    
    #We write each module to an individual file containing affymetrix probeset IDs
    modulesC1C2Merged<-extractModules(colorh1C1C2,datC1,anno,dir="modules",file_prefix=paste("Output","Specific_module",sep=''),write=T)
    write.table(colorh1C1C2,file="module_assignment.txt",row.names=F,col.names=F,quote=F)
    
    #We plot to a file the comparative heatmap showing correlation changes in the modules
    #The code for the function plotC1C2Heatmap and others can be found below under the Supporting Functions section
    
    plotC1C2Heatmap(colorh1C1C2,AdjMatC1,AdjMatC2, datC1, datC2)
    png(file="exprChange.png",height=500,width=500)
    plotExprChange(datC1,datC2,colorh1C1C2)
    dev.off()
    

    进行显著性分析

    #This function computes the dispersion value that
    #quantifies the change in correlation between two conditions
    #for pair of genes drawn from module c1 and module c2
    # in case c1 = c2, the function quantifies the differential coexpression in c1.
    #cf Choi and Kendziorski 2009
    dispersionModule2Module<-function(c1,c2,datC1,datC2,colorh1C1C2)
    {
        if (c1==c2)
        {
           difCor<-(cor(datC1[,which(colorh1C1C2 == c1)],method="spearman")-
           cor(datC2[,which(colorh1C1C2 == c1)],method="spearman"))^2
           n<-length(which(colorh1C1C2  ==c1))
          (1/((n^2 -n)/2)*(sum(difCor)/2))^(.5)
        }
        else if (c1!=c2)
        {
          difCor<-(cor(datC1[,which(colorh1C1C2 == c1)],datC1[,which(colorh1C1C2==c2)],method="spearman")-
                  cor(datC2[,which(colorh1C1C2 == c1)],datC2[,which(colorh1C1C2==c2)],method="spearman"))^2
          n1<-length(which(colorh1C1C2  ==c1))
          n2<-length(which(colorh1C1C2  ==c2))
          (1/((n1*n2))*(sum(difCor)))^(.5)
        }
    }
    
    # we generate a set of 1000 permuted indexes
    permutations<-NULL
    for (i in 1:1000)
    {
       permutations<-rbind(permutations,sample(1:(nrow(datC1)+nrow(datC2)),nrow(datC1)))
    }
    
    # we scale the data in both conditions to mean 0 and variance 1.
    d<-rbind(scale(datC1),scale(datC2))
    
    # This function calculates the dispersion value of a module to module coexpression change on permuted data
    permutationProcedureModule2Module<-function(permutation,d,c1,c2,colorh1C1C2)
    {
      d1<-d[permutation,]
      d2<-d[-permutation,]
      dispersionModule2Module(c1,c2,d1,d2,colorh1C1C2)
    }
    
    #We compute all pairwise module to module dispersion values, and generate a null distribution from permuted scaled data
    dispersionMatrix<-matrix(nrow=length(unique(colorh1C1C2))-1,ncol=length(unique(colorh1C1C2))-1)
    nullDistrib<-list()
    i<-j<-0
    for (c1 in setdiff(unique(colorh1C1C2),"grey"))
    {
      i<-i+1
      j<-0
      nullDistrib[[c1]]<-list()
      for (c2 in setdiff(unique(colorh1C1C2),"grey"))
      {
        j<-j+1
        dispersionMatrix[i,j]<-dispersionModule2Module(c1,c2,datC1,datC2,colorh1C1C2)
        nullDistrib[[c1]][[c2]]<-apply(permutations,1,permutationProcedureModule2Module,d,c2,c1,colorh1C1C2)
      }
    }
    
    
    #We create a summary matrix indicating for each module to module 
    #differential coexpression the number of permuted data yielding 
    #an equal or higher dispersion.
    permutationSummary<-matrix(nrow=8,ncol=8)
    colnames(permutationSummary)<-setdiff(unique(colorh1C1C2),"grey")
    rownames(permutationSummary)<-setdiff(unique(colorh1C1C2),"grey")
    for (i in 1:8) { 
      for (j in 1:8) {
         permutationSummary[i,j]<-length(which(nullDistrib[[i]][[j]] >= dispersionMatrix[i,j]))
                      }
    }
    
    #We plot the result (cf supplementary figure 1)
    plotMatrix(permutationSummary)
    

    相关文章

      网友评论

          本文标题:差分网络分析DiffCoEx

          本文链接:https://www.haomeiwen.com/subject/dxercrtx.html