美文网首页画图免疫知识R 语言
igraph包可视化基因调控网络

igraph包可视化基因调控网络

作者: 小贝学生信 | 来源:发表于2021-05-16 20:27 被阅读0次

参考教程:https://www.biostars.org/p/285296/


整体思路可分为两大步:
(1)根据测序数据,挖掘出与某一特定生物学因素高度相关的一组基因,以及相应的表达信息;
(2)然后计算这些基因间的表达相关性,利用igraph包进行可视化。

Step 1 :Get the gene info

  • 示例数据:estrogen包提供的测序数据。根据是否添加雌激素,以及作用时间两个因素,共分为四组,每组两次重复。


  • 感兴趣基因:基因的表达水平与是否添加雌激素,以及作用时间两个因素显著相关的Top基因。

1.1 获取表达矩阵

library(estrogen)
library(affy)
datadir <- system.file("extdata", package="estrogen")
dir(datadir)
setwd(datadir)

# Read in phenotype data and the raw CEL files
pd <- read.AnnotatedDataFrame("estrogen.txt", header=TRUE, sep="", row.names=1)
a <- ReadAffy(filenames=rownames(pData(pd)), phenoData=pd, verbose=TRUE)
# Normalise the data
x <- expresso(
  a,
  bgcorrect.method="rma",
  normalize.method="constant",
  pmcorrect.method="pmonly",
  summary.method="avgdiff")
# Remove control probes
controlProbeIdx <- grep("^AFFX", rownames(x))
x <- x[-controlProbeIdx,]
x
head(exprs(x))
pData(x)

1.2 分析与添加雌激素,以及作用时间高度相关的基因

# Identify genes of significant effect
lm.coef <- function(y) lm(y ~ estrogen * time.h)$coefficients
eff <- esApply(x, 1, lm.coef)
effectUp <- names(sort(eff[2,], decreasing=TRUE)[1:25])
effectDown <- names(sort(eff[2,], decreasing=FALSE)[1:25])
main.effects <- c(effectUp, effectDown)
str(main.effects)
#chr [1:50] "36785_at" "39755_at" "151_s_at" "32174_at" "39781_at" "1985_s_at" "1840_g_at" ...

# Filter the expression set object to include only genes of significant effect
estrogenMainEffects <- exprs(x)[main.effects,]
dim(estrogenMainEffects)
#[1] 50  8

Step 2 :visualize these gene regulatory network

2.1 计算两两基因间的关系/联系

#基于相关性
gene_cor=as.matrix(as.dist(cor(t(estrogenMainEffects), method="pearson")))
dim(gene_cor)
#50 50

#基于欧几里得距离
#gene_dist=as.matrix(dist(estrogenMainEffects, method="euclidean"))

2.2 创建igraph对象

library(igraph)
# Create a graph adjacency based on correlation distances between genes in  pairwise fashion.
g <- graph.adjacency(
  gene_cor,
  mode="undirected",
  weighted=TRUE,
  diag=FALSE
)

# Simplfy the adjacency object
g <- simplify(g, remove.multiple=TRUE, remove.loops=TRUE)

2.3 修饰igraph对象

  • 根据基因间相关性的正负性,设置边的颜色
# Colour negative correlation edges as blue
E(g)[which(E(g)$weight<0)]$color <- "darkblue"

# Colour positive correlation edges as red
E(g)[which(E(g)$weight>0)]$color <- "darkred"
  • 剔除相关性小于0.8的边
# Convert edge weights to absolute values
E(g)$weight <- abs(E(g)$weight)

# Remove edges below absolute Pearson correlation 0.8
g <- delete_edges(g, E(g)[which(E(g)$weight<0.8)])

# Remove any vertices remaining that have no edges
g <- delete.vertices(g, degree(g)==0)

# Assign names to the graph vertices (optional)
V(g)$name <- V(g)$name
  • 设置节点的形状、颜色
# Change shape of graph vertices
V(g)$shape <- "sphere"

# Change colour of graph vertices
V(g)$color <- "skyblue"

# Change colour of vertex frames
V(g)$vertex.frame.color <- "white"
  • 分别设置节点、边的大小。前者与基因表达量相关、后者则取决于相关性
# Scale the size of the vertices to be proportional to the level of expression of each gene represented by each vertex
# Multiply scaled vales by a factor of 10
scale01 <- function(x){(x-min(x))/(max(x)-min(x))}
vSizes <- (scale01(apply(estrogenMainEffects, 1, mean)) + 1.0) * 5

# Amplify or decrease the width of the edges
edgeweights <- E(g)$weight * 2.0
# Convert the graph adjacency object into a minimum spanning tree based on Prim's algorithm
mst <- mst(g, algorithm="prim")

2.3 可视化

  • 直接可视化
set.seed(1)
plot(
  mst,
  layout=layout.graphopt,
  edge.curved=TRUE,
  vertex.size=vSizes,
  vertex.label.dist=-0.5,
  vertex.label.color="black",
  asp=FALSE,
  vertex.label.cex=0.6,
  edge.width=edgeweights,
  edge.arrow.mode=0,
  main="My first graph")
  • 其它可选的layout排版方式有:
    layout.fruchterman.reingold
    layout.kamada.kawai
    layout.lgl
    layout.circle
    layout.graphopt


  • 设置module标注

mst.communities <- edge.betweenness.community(mst, weights=NULL, directed=FALSE)
mst.clustering <- make_clusters(mst, membership=mst.communities$membership)
V(mst)$color <- mst.communities$membership + 1

set.seed(1)
plot(
  mst,
  layout=layout.fruchterman.reingold,
  edge.curved=TRUE,
  vertex.size=vSizes,
  vertex.label.dist=-0.5,
  vertex.label.color="black",
  asp=FALSE,
  vertex.label.cex=0.6,
  edge.width=edgeweights,
  edge.arrow.mode=0,
  main="My second graph")
set.seed(1)
plot(
  mst.clustering, mst,
  layout=layout.fruchterman.reingold,
  edge.curved=TRUE,
  vertex.size=vSizes,
  vertex.label.dist=-0.5,
  vertex.label.color="black",
  asp=FALSE,
  vertex.label.cex=0.6,
  edge.width=edgeweights,
  edge.arrow.mode=0,
  main="My second graph")

如上进行module的划分或者说是聚类时使用igraph包的edge.betweenness.community。igraph包还提供了其它多种划分算法可供使用,详见https://igraph.org/c/doc/igraph-Community.html
此外对于 case-control study, a useful approach is to generate separate networks for cases and controls and then compare the genes in these based on these metrics.即比较实验组与对照组的调控网络的差别,相关代码在教程最后有简单提到,就不赘述了。

相关文章

网友评论

    本文标题:igraph包可视化基因调控网络

    本文链接:https://www.haomeiwen.com/subject/kiyrjltx.html