美文网首页
[R]bioconductor之GenomicInteracti

[R]bioconductor之GenomicInteracti

作者: 小贝学生信 | 来源:发表于2020-10-19 14:35 被阅读0次

GenomicInteractions包是Utilities for handling genomic interaction data,可用于分析chiapet与Hi-C数据。本次主要学习分析Hi-C数据的相关用法
https://www.bioconductor.org/packages/release/bioc/html/GenomicInteractions.html
https://www.bioconductor.org/packages/release/bioc/vignettes/GenomicInteractions/inst/doc/hic_vignette.html

一、安装包,加载实例数据

1、安装包

BiocManager::install("GenomicInteractions")
library(GenomicInteractions)

2、示例文件

  • HiC data from wild type mouse double positive thymocytes.
  • The experiment was carried out using the HindIII restriction enzyme.
  • The paired-end reads were aligned to the mm9 mouse genome assembly.
  • HOMER software was used to filter reads and detect significant interactions at a resolution of 100kb.
  • below only data from chromosomes 14 and 15.
#示例文件地址
hic_file <- system.file("extdata", "Seitan2013_WT_100kb_interactions.txt",
                        package="GenomicInteractions")
hic_file
#[1] "C:/Users/Dell/Documents/R/win-library/3.6/GenomicInteractions/extdata/Seitan2013_WT_100kb_interactions.txt"

#查看该txt文件
a <- read.table(file = hic_file, header = T)
dim(a)
a[1:6,1:12]
a[1:6,1:12]
#创建GenomicInteractions对象
hic_data <- makeGenomicInteractionsFromFile(hic_file,
                                            type="homer",
                                            experiment_name = "HiC 100kb",
                                         description = "HiC 100kb resolution")
seqlengths(hic_data) <- c(chr15 = 103494974, chr14 = 125194864)
head(hic_data[,1:2])

如下图,GenomicInteractions对象类似Genomicranges对象也是有两部分组成:左边为interactions的两个序列坐标信息,右边为metadata,暂时只看两列信息。


head(hic_data[,1:2])

二、regions/bins相关统计操作

如上所述,interactions at a resolution of 100kb. 因此基本regions的宽度为100000

  • check that the anchors are of the expected size (100kb).
summary(width(regions(hic_data)))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   89536  100000  100000   99991  100000  100000

Some anchors are shorter than 100kb due to the bin being at the end of a chromosome.

  • regions:all regions included in the object
regions(hic_data)
  • anchors: the first and second anchors of the interactions
anchors(hic_data)
anchors(hic_data,type="first")
anchorOne(hic_data)

anchors(hic_data,type="second") 
anchorTwo(hic_data)
  • 统计互为interaction的两个bins的距离
summary(calculateDistances(hic_data,method="midpoint"))
#    Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
#  100000   4600000  18700000  24875054  39200000 116600000        34

三、interactions相关统计操作/mcols

  • |分隔,右边的就是meta信息,如上图,第一列为interaction counts,第二列为interaction name,还有P值等,可以用$符取值,不过没有tab补全功能
mcols(hic_data)
mcols(hic_data)$counts
mcols(hic_data)$InteractionID
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,2))
hic_data$p.value <- exp(hic_data$LogP)
plot(density(hic_data$p.value))
hic_data$FDR.Benjamini..based.on.3.68e.08.total.tests.
hic_data$fdr <- hic_data$FDR.Benjamini..based.on.3.68e.08.total.tests.
plot(density(hic_data$fdr))
hic_data$p.value & hic_data$fdr
  • counts相关统计
#mcols(hic_data)$counts
head(interactionCounts(hic_data))
#447000 reads totally
sum(interactionCounts(hic_data))
#23276 interactions in total
length(interactionCounts(hic_data))
#平均interactions覆盖reads数
mean(interactionCounts(hic_data))
hist & density
p1 <- plotCisTrans(hic_data) +
  labs(title=" ")
sum(is.trans(hic_data_subset))
p2 <- plotCisTrans(hic_data_subset) +
  labs(title=" ")
library(patchwork)
p1 + labs(title = "hic_data")| 
  p2 + labs(title = "hic_data_subset")

cis在同一染色体,trans在染色体。可以看到在示例数据中大部分都是cis-interaction


plotCisTrans
p3 <- plotCounts(hic_data, cut=30)
#cut可以避免极端值影响绘图
p4 <- plotCounts(hic_data_subset, cut=30)
p3 + labs(title = "hic_data")| 
  p4 + labs(title = "hic_data_subset")
plotCounts

四、Interactions Annotation

1、注释文件

  • One of the most powerful features of GenomicInteractions is that it allows you to annotate interactions by whether the anchors overlap genomic features of interest, such as promoters or enhancers.
    即注释哪些bins里有启动子序列或增强子序列
  • 启动子promoters序列可通过GenomicFeatures提供;
  • 增强子enhance序列 http://chromosome.sdsc.edu/mouse/download.html提供有老鼠的
    上述二者具体下载方式见开头链接,本次使用包里的示例文件
data("mm9_refseq_promoters")
data("thymus_enhancers")
mm9_refseq_promoters
thymus_enhancers
annotation.features <- list(promoter = mm9_refseq_promoters, enhancer = thymus_enh)
annotation.features
annotation.features

2、annotateInteractions注释

annotateInteractions(hic_data_subset, annotation.features)
head(regions(hic_data_subset))
head(regions(hic_data_subset)$promoter.id)
head(regions(hic_data_subset)$enhancer.id)

如上即为所有的bins注释了promoter与enhancer的重叠概况。
Each anchor may overlap multiple features of each type, so the columns containing feature names or IDs are stored as lists.

head(regions(hic_data_subset))
  • node.class列即为分布情况,
regions(hic_data_subset)$node.class
table(regions(hic_data_subset)$node.class)
#  distal enhancer promoter 
#     989      275      890

如上分为三大类,当某一个bin同时注释到enhancer与promoter时,结果也只能注释一类,按照创造annotation.features对象时的顺序给出一个。
Any anchors which are not annotated with any of the given features will be assigned the class “distal”

3、Interaction types

如上共有三种node,两两组合共有6种可能

plotInteractionAnnotations(hic_data_subset, legend = TRUE)
plotInteractionAnnotations
  • 统计感兴趣的Interaction types
    isInteractionType takes two character arguments which are annotated node classes and returns interactions between them.
    is.pp,is.pd etc. are bindings for common annotations: p-promoter;d-distal
length(hic_data_subset[isInteractionType(hic_data_subset, "promoter", "distal")])
# [1] 492
length(hic_data_subset[is.pd(hic_data_subset)])
# [1] 492
hic_data_ep <- hic_data_subset[isInteractionType(hic_data_subset, "promoter", "enhancer")]
max(interactionCounts(hic_data_ep))

most_counts <- hic_data_ep[which.max(interactionCounts(hic_data_ep))]
most_counts

min_pval <- hic_data_ep[which.min(hic_data_ep$p.value)]
min_pval

教程最后还介绍了在注释promoter,enhancer的基础上,利用Gviz包介绍了interactions可视化的方法,感觉还比较复杂,下次再专门学习下。

相关文章

网友评论

      本文标题:[R]bioconductor之GenomicInteracti

      本文链接:https://www.haomeiwen.com/subject/kuskmktx.html