Cerebro：一个好用的单细胞数据展示shiny工具

作者: TOP生物信息 | 来源:发表于2022-03-02 19:47 被阅读0次

Cerebro：一个好用的单细胞数据展示shiny工具
elasticsearch-cerebro查看集群工具
docker安装cerebro
Shiny reactive的用法与案例展示
使用scater包进行单细胞测序分析（一）：数据导入与Singl
《运动轨迹，在分析报表中也能查了！》回顾
这种Dotplot展示单细胞基因表达
Shiny做网站
R shiny教程-4：Shiny app响应式结果展示
sc-RAN-seq 数据分析||Seurat新版教程:Guid

常见的单细胞转录组图形基本都包含，用起来也比较方便，很适合不擅长生信分析的人来做数据探索。对于做分析的人来讲，这样一个网页工具也降低了和非分析合作者的沟通成本。
该工具2019年发表在Bioinformatics，文献标题"Cerebro: interactive visualization of scRNA-seq data".

具体来讲，Cerebro的功能主要包括：

交互式降维结果展示
差异基因展示
通路富集分析结果展示
基因和基因集评分的展示
拟时序分析结果展示，用的Monocle2
导出图片为pdf
表格的导出

Cerebro的使用方法也很简单，首先需要准备好一个较完整的seurat对象，细胞注释这一步最好已经完成。得到这样一个seurat对象的代码为1.seurat.R，本文不演示。（公粽号上）本文最后会提供链接，用于获取本文涉及到的所有代码和示例数据。

下面先来看一下，如何生成这个shiny工具需要的.crb文件。

1. 安装R包并加载

cerebroApp这个R包是配套Cerebro工具使用的，这个R包的目的主要是准备Cerebro的输入文件，也包括一些基本分析，比如找差异基因、富集分析，都是一步搞定的。详细教程在https://romanhaa.github.io/cerebroApp/articles/cerebroApp_workflow_Seurat.html

BiocManager::install('romanhaa/cerebroApp')
library(cerebroApp)

2. 导入seurat对象，并进行基本分析

testEC=readRDS("testEC.rds")

2.1 计算线粒体、核糖体基因表达占比

testEC <- addPercentMtRibo(testEC,organism = 'hg',gene_nomenclature = 'name')

之后meta数据框会多出两列。此外，很神奇的是，所用到的线粒体、核糖体基因集会被存储在：

testEC@misc$gene_lists$mitochondrial_genes
testEC@misc$gene_lists$ribosomal_genes

我也是头一回知道seurat对象还可以有这个misc slot, 用于存储一些有用的基因集、数据框。

2.2 获取表达最多的基因
这里表达最多是从UMI总数的角度来说的，是指在特定的分组情况下，某个基因的UMI之和，比上所有基因的UMI之和。

testEC <- getMostExpressedGenes(testEC,assay = 'RNA',groups = c('sample','maintype'))

比如用maintype分组，就是求每种细胞类型里面，哪些基因表达最多。这些基因以数据框的形式存储：

testEC@misc$most_expressed_genes$sample
testEC@misc$most_expressed_genes$maintype

2.3 获取marker基因
这一步调用的是seurat的FindAllMarkers

testEC <- getMarkerGenes(
  testEC,
  assay = 'RNA',
  organism = 'hg',
  groups = c('sample','maintype'),
  name = 'cerebro_seurat',
  only_pos = TRUE,
  min_pct = 0.7,
  thresh_logFC = 0.25,
  thresh_p_val = 0.01,
  test = "wilcox"
)

结果存储在：

testEC@misc$marker_genes$cerebro_seurat$sample
testEC@misc$marker_genes$cerebro_seurat$maintype

这里最后一列是on_cell_surface信息，这个seurat里面好像是没有的。

2.4 通路富集分析

testEC <- getEnrichedPathways(
  testEC,
  marker_genes_input = 'cerebro_seurat',
  adj_p_cutoff = 0.01,
  max_terms = 100
)

结果以数据框的形式存储在：

testEC@misc$enriched_pathways$cerebro_seurat_enrichr$sample
testEC@misc$enriched_pathways$cerebro_seurat_enrichr$maintype

这一步富集用到的数据库比较全，包括：

2.5 GSEA也能做
不过只是得到表格，不出图

gmt_path <- "D:/hsy/bioinformatics/PTJ/019_EC_600/metabolic_pathways3.gmt"

testEC <- performGeneSetEnrichmentAnalysis(
  testEC,
  assay = 'RNA',
  GMT_file = gmt_path,
  groups = c('sample','maintype')
)

# 结果存储在：
# testEC@misc$enriched_pathways$cerebro_GSVA$sample
# testEC@misc$enriched_pathways$cerebro_GSVA$maintype

3. 导出`crb`文件

exportFromSeurat(
  testEC,
  assay = 'RNA',
  slot = 'data',
  file = 'testEC.crb',
  experiment_name = 'EC',
  organism = 'hg',
  groups = c('sample','seurat_clusters','maintype'),
  nUMI = 'nCount_RNA',
  nGene = 'nFeature_RNA',
  add_all_meta_data = TRUE,
  verbose = FALSE
)

这一步会在工作目录下面生成.crb文件