美文网首页
【DoRothEA】单细胞转录因子分析

【DoRothEA】单细胞转录因子分析

作者: jjjscuedu | 来源:发表于2023-12-25 16:24 被阅读0次

    今天学习另外一个转录因子活性预测工具:DoRothEA。很多文章用的也是这个工具,看起来比pySCENIC好像快很多。其实,单细胞转录因子分析本质上是计算一种specific的得分,DoRothEA计算的是 Viper 得分。

    ======安装=====

    官网地址是:

    https://saezlab.github.io/dorothea/

    if (!requireNamespace("BiocManager", quietly = TRUE))
        install.packages("BiocManager")
    BiocManager::install("dorothea")
    
    或者用devtools安装:
    devtools::install_github("saezlab/dorothea")
    
    还需要一个依赖包:
    BiocManager::install("viper")
    
    

    =====例子测试====

    还是用经典的pbmc的例子。

    加载需要的库:
    library(Seurat)
    library(dorothea)
    library(tidyverse)
    library(viper)
    
    把pbmc的数据load进来,分析和前面的很多例子一样
    
    pbmc <- readRDS("pbmc.rds")
    str(pbmc@meta.data)
    
    

    下面,加载这个包自带的human的数据库

    ## We read Dorothea Regulons for Human:
    dorothea_regulon_human <- get(data("dorothea_hs", package = "dorothea"))
    
    ## We obtain the regulons based on interactions with confidence level A, B and C
    regulon <- dorothea_regulon_human %>% dplyr::filter(confidence %in% c("A","B","C"))
    
    

    然后计算TF的活性,通过使用函数 run_viper() 在 DoRothEA 的regulons上运行 VIPER 以获得 TFs activity。 在 seurat 对象的情况下,该函数返回相同的 seurat 对象,其中包含一个名为 dorothea 的assay,其中包含slot数据中的 TFs activity。

    pbmc <- run_viper(pbmc, regulon,options = list(method = "scale", minsize = 4, 
                      eset.filter = FALSE, cores = 1, verbose = FALSE))
    Assays(pbmc)
    
    image.png

    可以看到,这个pbmc数据集里面的约 三千个细胞都有自己的266个转录因子的活性得分。

    image.png

    下面,我们试试看FindAllMarkers函数获取各个单细胞亚群特异性的转录因子。

    DefaultAssay(object = pbmc) <- "dorothea"
    table(Idents(pbmc))
    
    image.png
    pbmc.markers <- FindAllMarkers(object = pbmc, 
                                  only.pos = TRUE, 
                                  min.pct = 0.25, 
                                  thresh.use = 0.25)                                                          
    DT::datatable(pbmc.markers)
    write.csv(pbmc.markers,file='pbmc.markers.csv')
    

    下面,我们尝试热图展示:

    library(dplyr) 
    pbmc.markers$fc = pbmc.markers$pct.1 - pbmc.markers$pct.2
    top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(10, fc)
    pbmc@assays$dorothea@data[1:4,1:4]
    top10 =top10[top10$fc > 0.3,] 
    pbmc <- ScaleData(pbmc)
    DoHeatmap(pbmc,top10$gene,size=3,slot = 'scale.data')
    

    从热图可以看出,其实还是有一大部分TF在不同的细胞类型中特异性表达的。

    image.png

    然后,我们尝试利用这个包来找下特异性比较好的TF regulons。官网有详细的文档介绍,比如如何选择TF activity per cell population,根据 previously computed VIPER scores on DoRothEA’s regulons.

    首先也是获取Viper得分矩阵 ,根据不同细胞亚群进行归纳汇总 :

    viper_scores_df <- GetAssayData(pbmc, slot = "scale.data", assay = "dorothea") %>%
    data.frame(check.names = F) %>% t()
    viper_scores_df[1:4,1:4]
    
    
    image.png
    ## We create a data frame containing the cells and their clusters
    CellsClusters <- data.frame(cell = names(Idents(pbmc)), 
                                cell_type = as.character(Idents(pbmc)),
                                check.names = F)
    head(CellsClusters)
    
    ## We create a data frame with the Viper score per cell and its clusters
    viper_scores_clusters <- viper_scores_df  %>%
      data.frame() %>% 
      rownames_to_column("cell") %>%
      gather(tf, activity, -cell) %>%
      inner_join(CellsClusters)
      
      
    ## We summarize the Viper scores by cellpopulation
    summarized_viper_scores <- viper_scores_clusters %>% 
      group_by(tf, cell_type) %>%
      summarise(avg = mean(activity),
                std = sd(activity))
                
    # For visualization purposes, we select the 20 most variable TFs across clusters according to our scores.
    head(summarized_viper_scores)
    
    
    ## We select the 20 most variable TFs. (20*9 populations = 180)
    highly_variable_tfs <- summarized_viper_scores %>%
      group_by(tf) %>%
      mutate(var = var(avg))  %>%
      ungroup() %>%
      top_n(180, var) %>%
      distinct(tf)
    highly_variable_tfs
    
    ## We prepare the data for the plot
    summarized_viper_scores_df <- summarized_viper_scores %>%
      semi_join(highly_variable_tfs, by = "tf") %>%
      dplyr::select(-std) %>%   
      spread(tf, avg) %>%
      data.frame(row.names = 1, check.names = FALSE) 
    
                      
    

    最后,整理好的数据如下所示:

    summarized_viper_scores_df[1:4,1:4]
    
    image.png

    最终数据是挑选的top 20个TF,它们在不同的单细胞亚群里面的变化最大。这样就可以可视化这些细胞类型特异表达的TF regulon了。

    有了数据,就很容易可视化:

    palette_length = 100
    my_color = colorRampPalette(c("Darkblue", "white","red"))(palette_length)
    my_breaks <- c(seq(min(summarized_viper_scores_df), 0, 
                       length.out=ceiling(palette_length/2) + 1),
                   seq(max(summarized_viper_scores_df)/palette_length, 
                       max(summarized_viper_scores_df), 
                       length.out=floor(palette_length/2)))
    library(pheatmap)
    viper_hmap <- pheatmap(t(summarized_viper_scores_df),fontsize=14, 
                           fontsize_row = 10, 
                           color=my_color, breaks = my_breaks, 
                           main = "DoRothEA (ABC)", angle_col = 45,
                           treeheight_col = 0,  border_color = NA) 
    
    
    image.png

    相关文章

      网友评论

          本文标题:【DoRothEA】单细胞转录因子分析

          本文链接:https://www.haomeiwen.com/subject/wlofndtx.html