【DoRothEA】单细胞转录因子分析

作者: jjjscuedu | 来源:发表于2023-12-25 16:24 被阅读0次

跟着Cell学单细胞转录组分析(十二):转录因子分析
SCENIC单细胞转录因子分析
10xGenomics单细胞转录组亚群细分策略
大麦转录因子分析
PySCENIC（一）：python版单细胞转录组转录因子分析
SCENIC单细胞转录因子分析原理
单细胞测序技术
单细胞转录组测序知识一隅
PySCENIC（二）：pyscenic单细胞转录组转录因子分析
Hemberg-lab单细胞转录组数据分析（八） - Scate

今天学习另外一个转录因子活性预测工具：DoRothEA。很多文章用的也是这个工具，看起来比pySCENIC好像快很多。其实，单细胞转录因子分析本质上是计算一种specific的得分，DoRothEA计算的是 Viper 得分。

======安装=====

官网地址是：

https://saezlab.github.io/dorothea/

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("dorothea")

或者用devtools安装：
devtools::install_github("saezlab/dorothea")

还需要一个依赖包：
BiocManager::install("viper")

=====例子测试====

还是用经典的pbmc的例子。

加载需要的库：
library(Seurat)
library(dorothea)
library(tidyverse)
library(viper)

把pbmc的数据load进来，分析和前面的很多例子一样

pbmc <- readRDS("pbmc.rds")
str(pbmc@meta.data)

下面，加载这个包自带的human的数据库

## We read Dorothea Regulons for Human:
dorothea_regulon_human <- get(data("dorothea_hs", package = "dorothea"))

## We obtain the regulons based on interactions with confidence level A, B and C
regulon <- dorothea_regulon_human %>% dplyr::filter(confidence %in% c("A","B","C"))

然后计算TF的活性，通过使用函数 run_viper() 在 DoRothEA 的regulons上运行 VIPER 以获得 TFs activity。在 seurat 对象的情况下，该函数返回相同的 seurat 对象，其中包含一个名为 dorothea 的assay，其中包含slot数据中的 TFs activity。

pbmc <- run_viper(pbmc, regulon,options = list(method = "scale", minsize = 4, 
                  eset.filter = FALSE, cores = 1, verbose = FALSE))
Assays(pbmc)

image.png

可以看到，这个pbmc数据集里面的约三千个细胞都有自己的266个转录因子的活性得分。

image.png

下面，我们试试看FindAllMarkers函数获取各个单细胞亚群特异性的转录因子。

DefaultAssay(object = pbmc) <- "dorothea"
table(Idents(pbmc))

image.png

pbmc.markers <- FindAllMarkers(object = pbmc, 
                              only.pos = TRUE, 
                              min.pct = 0.25, 
                              thresh.use = 0.25)                                                          
DT::datatable(pbmc.markers)
write.csv(pbmc.markers,file='pbmc.markers.csv')

下面，我们尝试热图展示：

library(dplyr) 
pbmc.markers$fc = pbmc.markers$pct.1 - pbmc.markers$pct.2
top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(10, fc)
pbmc@assays$dorothea@data[1:4,1:4]
top10 =top10[top10$fc > 0.3,] 
pbmc <- ScaleData(pbmc)
DoHeatmap(pbmc,top10$gene,size=3,slot = 'scale.data')

从热图可以看出，其实还是有一大部分TF在不同的细胞类型中特异性表达的。

image.png

然后，我们尝试利用这个包来找下特异性比较好的TF regulons。官网有详细的文档介绍，比如如何选择TF activity per cell population，根据 previously computed VIPER scores on DoRothEA’s regulons.

首先也是获取Viper得分矩阵，根据不同细胞亚群进行归纳汇总：

viper_scores_df <- GetAssayData(pbmc, slot = "scale.data", assay = "dorothea") %>%
data.frame(check.names = F) %>% t()
viper_scores_df[1:4,1:4]

image.png

## We create a data frame containing the cells and their clusters
CellsClusters <- data.frame(cell = names(Idents(pbmc)), 
                            cell_type = as.character(Idents(pbmc)),
                            check.names = F)
head(CellsClusters)

## We create a data frame with the Viper score per cell and its clusters
viper_scores_clusters <- viper_scores_df  %>%
  data.frame() %>% 
  rownames_to_column("cell") %>%
  gather(tf, activity, -cell) %>%
  inner_join(CellsClusters)
  
  
## We summarize the Viper scores by cellpopulation
summarized_viper_scores <- viper_scores_clusters %>% 
  group_by(tf, cell_type) %>%
  summarise(avg = mean(activity),
            std = sd(activity))
            
# For visualization purposes, we select the 20 most variable TFs across clusters according to our scores.
head(summarized_viper_scores)


## We select the 20 most variable TFs. (20*9 populations = 180)
highly_variable_tfs <- summarized_viper_scores %>%
  group_by(tf) %>%
  mutate(var = var(avg))  %>%
  ungroup() %>%
  top_n(180, var) %>%
  distinct(tf)
highly_variable_tfs

## We prepare the data for the plot
summarized_viper_scores_df <- summarized_viper_scores %>%
  semi_join(highly_variable_tfs, by = "tf") %>%
  dplyr::select(-std) %>%   
  spread(tf, avg) %>%
  data.frame(row.names = 1, check.names = FALSE)

最后，整理好的数据如下所示：

summarized_viper_scores_df[1:4,1:4]

image.png

最终数据是挑选的top 20个TF，它们在不同的单细胞亚群里面的变化最大。这样就可以可视化这些细胞类型特异表达的TF regulon了。

有了数据，就很容易可视化：

palette_length = 100
my_color = colorRampPalette(c("Darkblue", "white","red"))(palette_length)
my_breaks <- c(seq(min(summarized_viper_scores_df), 0, 
                   length.out=ceiling(palette_length/2) + 1),
               seq(max(summarized_viper_scores_df)/palette_length, 
                   max(summarized_viper_scores_df), 
                   length.out=floor(palette_length/2)))
library(pheatmap)
viper_hmap <- pheatmap(t(summarized_viper_scores_df),fontsize=14, 
                       fontsize_row = 10, 
                       color=my_color, breaks = my_breaks, 
                       main = "DoRothEA (ABC)", angle_col = 45,
                       treeheight_col = 0,  border_color = NA)

image.png