GenomicInteractions包是Utilities for handling genomic interaction data,可用于分析chiapet与Hi-C数据。本次主要学习分析Hi-C数据的相关用法
https://www.bioconductor.org/packages/release/bioc/html/GenomicInteractions.html
https://www.bioconductor.org/packages/release/bioc/vignettes/GenomicInteractions/inst/doc/hic_vignette.html
一、安装包,加载实例数据
1、安装包
BiocManager::install("GenomicInteractions")
library(GenomicInteractions)
2、示例文件
- HiC data from wild type mouse double positive thymocytes.
- The experiment was carried out using the HindIII restriction enzyme.
- The paired-end reads were aligned to the mm9 mouse genome assembly.
- HOMER software was used to filter reads and detect significant interactions at a resolution of 100kb.
- below only data from chromosomes 14 and 15.
#示例文件地址
hic_file <- system.file("extdata", "Seitan2013_WT_100kb_interactions.txt",
package="GenomicInteractions")
hic_file
#[1] "C:/Users/Dell/Documents/R/win-library/3.6/GenomicInteractions/extdata/Seitan2013_WT_100kb_interactions.txt"
#查看该txt文件
a <- read.table(file = hic_file, header = T)
dim(a)
a[1:6,1:12]
![](https://img.haomeiwen.com/i20354525/749805c3c4e03821.png)
#创建GenomicInteractions对象
hic_data <- makeGenomicInteractionsFromFile(hic_file,
type="homer",
experiment_name = "HiC 100kb",
description = "HiC 100kb resolution")
seqlengths(hic_data) <- c(chr15 = 103494974, chr14 = 125194864)
head(hic_data[,1:2])
如下图,GenomicInteractions对象类似Genomicranges对象也是有两部分组成:左边为interactions的两个序列坐标信息,右边为metadata,暂时只看两列信息。
![](https://img.haomeiwen.com/i20354525/0464ba3621e86b63.png)
二、regions/bins相关统计操作
如上所述,interactions at a resolution of 100kb. 因此基本regions的宽度为100000
- check that the anchors are of the expected size (100kb).
summary(width(regions(hic_data)))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 89536 100000 100000 99991 100000 100000
Some anchors are shorter than 100kb due to the bin being at the end of a chromosome.
-
regions
:all regions included in the object
regions(hic_data)
-
anchors
: the first and second anchors of the interactions
anchors(hic_data)
anchors(hic_data,type="first")
anchorOne(hic_data)
anchors(hic_data,type="second")
anchorTwo(hic_data)
- 统计互为interaction的两个bins的距离
summary(calculateDistances(hic_data,method="midpoint"))
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
# 100000 4600000 18700000 24875054 39200000 116600000 34
三、interactions相关统计操作/mcols
- 以
|
分隔,右边的就是meta信息,如上图,第一列为interaction counts,第二列为interaction name,还有P值等,可以用$
符取值,不过没有tab补全功能
mcols(hic_data)
mcols(hic_data)$counts
mcols(hic_data)$InteractionID
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,2))
hic_data$p.value <- exp(hic_data$LogP)
plot(density(hic_data$p.value))
hic_data$FDR.Benjamini..based.on.3.68e.08.total.tests.
hic_data$fdr <- hic_data$FDR.Benjamini..based.on.3.68e.08.total.tests.
plot(density(hic_data$fdr))
![](https://img.haomeiwen.com/i20354525/ce97cbdacf4c536c.png)
- counts相关统计
#mcols(hic_data)$counts
head(interactionCounts(hic_data))
#447000 reads totally
sum(interactionCounts(hic_data))
#23276 interactions in total
length(interactionCounts(hic_data))
#平均interactions覆盖reads数
mean(interactionCounts(hic_data))
![](https://img.haomeiwen.com/i20354525/066f2d7f6e490851.png)
p1 <- plotCisTrans(hic_data) +
labs(title=" ")
sum(is.trans(hic_data_subset))
p2 <- plotCisTrans(hic_data_subset) +
labs(title=" ")
library(patchwork)
p1 + labs(title = "hic_data")|
p2 + labs(title = "hic_data_subset")
cis在同一染色体,trans在染色体。可以看到在示例数据中大部分都是cis-interaction
![](https://img.haomeiwen.com/i20354525/e3bc6ebcdf9567c1.png)
p3 <- plotCounts(hic_data, cut=30)
#cut可以避免极端值影响绘图
p4 <- plotCounts(hic_data_subset, cut=30)
p3 + labs(title = "hic_data")|
p4 + labs(title = "hic_data_subset")
![](https://img.haomeiwen.com/i20354525/8062fcf8a84011f2.png)
四、Interactions Annotation
1、注释文件
- One of the most powerful features of GenomicInteractions is that it allows you to annotate interactions by whether the anchors overlap genomic features of interest, such as promoters or enhancers.
即注释哪些bins里有启动子序列或增强子序列 - 启动子promoters序列可通过GenomicFeatures提供;
- 增强子enhance序列 http://chromosome.sdsc.edu/mouse/download.html提供有老鼠的
上述二者具体下载方式见开头链接,本次使用包里的示例文件
data("mm9_refseq_promoters")
data("thymus_enhancers")
mm9_refseq_promoters
thymus_enhancers
annotation.features <- list(promoter = mm9_refseq_promoters, enhancer = thymus_enh)
annotation.features
![](https://img.haomeiwen.com/i20354525/62463b2a1a2ce2b9.png)
2、annotateInteractions注释
annotateInteractions(hic_data_subset, annotation.features)
head(regions(hic_data_subset))
head(regions(hic_data_subset)$promoter.id)
head(regions(hic_data_subset)$enhancer.id)
如上即为所有的bins注释了promoter与enhancer的重叠概况。
Each anchor may overlap multiple features of each type, so the columns containing feature names or IDs are stored as lists.
![](https://img.haomeiwen.com/i20354525/0ec72d7df600e4d6.png)
- node.class列即为分布情况,
regions(hic_data_subset)$node.class
table(regions(hic_data_subset)$node.class)
# distal enhancer promoter
# 989 275 890
如上分为三大类,当某一个bin同时注释到enhancer与promoter时,结果也只能注释一类,按照创造annotation.features对象时的顺序给出一个。
Any anchors which are not annotated with any of the given features will be assigned the class “distal”
3、Interaction types
如上共有三种node,两两组合共有6种可能
plotInteractionAnnotations(hic_data_subset, legend = TRUE)
![](https://img.haomeiwen.com/i20354525/b8d620e0db2b4c16.png)
- 统计感兴趣的Interaction types
isInteractionType
takes two character arguments which are annotated node classes and returns interactions between them.
is.pp
,is.pd
etc. are bindings for common annotations: p-promoter;d-distal
length(hic_data_subset[isInteractionType(hic_data_subset, "promoter", "distal")])
# [1] 492
length(hic_data_subset[is.pd(hic_data_subset)])
# [1] 492
hic_data_ep <- hic_data_subset[isInteractionType(hic_data_subset, "promoter", "enhancer")]
max(interactionCounts(hic_data_ep))
most_counts <- hic_data_ep[which.max(interactionCounts(hic_data_ep))]
most_counts
min_pval <- hic_data_ep[which.min(hic_data_ep$p.value)]
min_pval
教程最后还介绍了在注释promoter,enhancer的基础上,利用Gviz
包介绍了interactions可视化的方法,感觉还比较复杂,下次再专门学习下。
网友评论