参考教程:https://www.biostars.org/p/285296/
整体思路可分为两大步:
(1)根据测序数据,挖掘出与某一特定生物学因素高度相关的一组基因,以及相应的表达信息;
(2)然后计算这些基因间的表达相关性,利用igraph包进行可视化。
Step 1 :Get the gene info
-
示例数据:estrogen包提供的测序数据。根据是否添加雌激素,以及作用时间两个因素,共分为四组,每组两次重复。
- 感兴趣基因:基因的表达水平与是否添加雌激素,以及作用时间两个因素显著相关的Top基因。
1.1 获取表达矩阵
library(estrogen)
library(affy)
datadir <- system.file("extdata", package="estrogen")
dir(datadir)
setwd(datadir)
# Read in phenotype data and the raw CEL files
pd <- read.AnnotatedDataFrame("estrogen.txt", header=TRUE, sep="", row.names=1)
a <- ReadAffy(filenames=rownames(pData(pd)), phenoData=pd, verbose=TRUE)
# Normalise the data
x <- expresso(
a,
bgcorrect.method="rma",
normalize.method="constant",
pmcorrect.method="pmonly",
summary.method="avgdiff")
# Remove control probes
controlProbeIdx <- grep("^AFFX", rownames(x))
x <- x[-controlProbeIdx,]
x
head(exprs(x))
pData(x)
1.2 分析与添加雌激素,以及作用时间高度相关的基因
# Identify genes of significant effect
lm.coef <- function(y) lm(y ~ estrogen * time.h)$coefficients
eff <- esApply(x, 1, lm.coef)
effectUp <- names(sort(eff[2,], decreasing=TRUE)[1:25])
effectDown <- names(sort(eff[2,], decreasing=FALSE)[1:25])
main.effects <- c(effectUp, effectDown)
str(main.effects)
#chr [1:50] "36785_at" "39755_at" "151_s_at" "32174_at" "39781_at" "1985_s_at" "1840_g_at" ...
# Filter the expression set object to include only genes of significant effect
estrogenMainEffects <- exprs(x)[main.effects,]
dim(estrogenMainEffects)
#[1] 50 8
Step 2 :visualize these gene regulatory network
- 关于igraph包,在之前有简单介绍,可参看笔记利用igraph包可视化基于KNN的单细胞聚类关系 - 简书 (jianshu.com)
2.1 计算两两基因间的关系/联系
#基于相关性
gene_cor=as.matrix(as.dist(cor(t(estrogenMainEffects), method="pearson")))
dim(gene_cor)
#50 50
#基于欧几里得距离
#gene_dist=as.matrix(dist(estrogenMainEffects, method="euclidean"))
2.2 创建igraph对象
library(igraph)
# Create a graph adjacency based on correlation distances between genes in pairwise fashion.
g <- graph.adjacency(
gene_cor,
mode="undirected",
weighted=TRUE,
diag=FALSE
)
# Simplfy the adjacency object
g <- simplify(g, remove.multiple=TRUE, remove.loops=TRUE)
2.3 修饰igraph对象
- 根据基因间相关性的正负性,设置边的颜色
# Colour negative correlation edges as blue
E(g)[which(E(g)$weight<0)]$color <- "darkblue"
# Colour positive correlation edges as red
E(g)[which(E(g)$weight>0)]$color <- "darkred"
- 剔除相关性小于0.8的边
# Convert edge weights to absolute values
E(g)$weight <- abs(E(g)$weight)
# Remove edges below absolute Pearson correlation 0.8
g <- delete_edges(g, E(g)[which(E(g)$weight<0.8)])
# Remove any vertices remaining that have no edges
g <- delete.vertices(g, degree(g)==0)
# Assign names to the graph vertices (optional)
V(g)$name <- V(g)$name
- 设置节点的形状、颜色
# Change shape of graph vertices
V(g)$shape <- "sphere"
# Change colour of graph vertices
V(g)$color <- "skyblue"
# Change colour of vertex frames
V(g)$vertex.frame.color <- "white"
- 分别设置节点、边的大小。前者与基因表达量相关、后者则取决于相关性
# Scale the size of the vertices to be proportional to the level of expression of each gene represented by each vertex
# Multiply scaled vales by a factor of 10
scale01 <- function(x){(x-min(x))/(max(x)-min(x))}
vSizes <- (scale01(apply(estrogenMainEffects, 1, mean)) + 1.0) * 5
# Amplify or decrease the width of the edges
edgeweights <- E(g)$weight * 2.0
- 计算得到最小生成树(MST,Minimum spanning tree);
- MST相关概念可参考:https://blog.csdn.net/luoshixian099/article/details/51908175。最直接的特征就是:边edge数量比vertex节点数量少1
# Convert the graph adjacency object into a minimum spanning tree based on Prim's algorithm
mst <- mst(g, algorithm="prim")
2.3 可视化
- 直接可视化
set.seed(1)
plot(
mst,
layout=layout.graphopt,
edge.curved=TRUE,
vertex.size=vSizes,
vertex.label.dist=-0.5,
vertex.label.color="black",
asp=FALSE,
vertex.label.cex=0.6,
edge.width=edgeweights,
edge.arrow.mode=0,
main="My first graph")
-
其它可选的layout排版方式有:
layout.fruchterman.reingold
layout.kamada.kawai
layout.lgl
layout.circle
layout.graphopt
-
设置module标注
mst.communities <- edge.betweenness.community(mst, weights=NULL, directed=FALSE)
mst.clustering <- make_clusters(mst, membership=mst.communities$membership)
V(mst)$color <- mst.communities$membership + 1
set.seed(1)
plot(
mst,
layout=layout.fruchterman.reingold,
edge.curved=TRUE,
vertex.size=vSizes,
vertex.label.dist=-0.5,
vertex.label.color="black",
asp=FALSE,
vertex.label.cex=0.6,
edge.width=edgeweights,
edge.arrow.mode=0,
main="My second graph")
set.seed(1)
plot(
mst.clustering, mst,
layout=layout.fruchterman.reingold,
edge.curved=TRUE,
vertex.size=vSizes,
vertex.label.dist=-0.5,
vertex.label.color="black",
asp=FALSE,
vertex.label.cex=0.6,
edge.width=edgeweights,
edge.arrow.mode=0,
main="My second graph")
如上进行module的划分或者说是聚类时使用igraph包的
edge.betweenness.community
。igraph包还提供了其它多种划分算法可供使用,详见https://igraph.org/c/doc/igraph-Community.html
此外对于 case-control study, a useful approach is to generate separate networks for cases and controls and then compare the genes in these based on these metrics.即比较实验组与对照组的调控网络的差别,相关代码在教程最后有简单提到,就不赘述了。
网友评论