CytoTRACE：推断拟时细胞起点辅助

作者: KS科研分享与服务 | 来源:发表于2023-07-24 09:49 被阅读0次

scRNA-seq拟时分析 || Monocle2 踩坑教程
monocle2
NBIS系列单细胞转录组数据分析实战（八）：拟时序细胞轨迹推断
CytoTRACE——拟时轨迹分析
单细胞转录组之Scanpy - 轨迹推断/拟时序分析
CytoTRACE 原理及代码解释
10X单细胞轨迹分析（拟时分析）之cytotrace
copyKAT推断单细胞转录组肿瘤细胞CNV
copyKAT从单细胞RNASeq数据推断人类肿瘤的基因组拷贝数
Cytotrace

在做monocle的时候，我们最纠结的地方莫过于确定细胞的起点了。如果是比较明确的分析知道起点，或者有参考文献，结合自己的生物学背景能够推断还好，可是如果自己一篇空白，该怎么确定起点呢？这里介绍一款工具R包---CytoTRACE，能够辅助拟时单细胞推断起点。其实这个包之前就有小伙伴推荐让写一下，因为很简单，一直没有写，最近刚好用到，所以出一期！
CytoTRACE包官网：https://cytotrace.stanford.edu/
包下载地址：https://cytotrace.stanford.edu/CytoTRACE_0.3.3.tar.gz
具体的原理感兴趣的可看看作者的原文。首先需要说的是这个包的安装。这个包有两个功能，一个是单个数据集的分析，另一个功能是多个数据集的分析。第二个功能需要依赖于python包，所以如果实在windows系统下，python包安装不成功的话第二个功能则无法使用，但是一般情况下也涉及不到多个数据集，所以我们这里的例子就不瞎折腾了，只需要第一个功能可以使用即可。这个包的安装需本地安装，在之前的链接中下载安装包：https://cytotrace.stanford.edu/CytoTRACE_0.3.3.tar.gz


setwd('D:/KS项目/公众号文章/CytoTRACE包推断拟时起点辅助')
#CytoTRACE包
#官网：https://cytotrace.stanford.edu/
#包下载地址：https://cytotrace.stanford.edu/CytoTRACE_0.3.3.tar.gz

#安装包
#包的安装需下载安装包、本地安装即可


#安装error
# ERROR: dependencies 'HiClimR', 'ccaPP', 'nnls' are not available for package 'CytoTRACE'
# * removing 'C:/Users/tq199/AppData/Local/R/win-library/4.2/CytoTRACE'
# Warning in install.packages :

#install.packages() 安装这个几个包
library(CytoTRACE)

# No non-system installation of Python could be found.
# Would you like to download and install Miniconda?
#   Miniconda is an open source environment management system for Python.
# See https://docs.conda.io/en/latest/miniconda.html for more details.

#因为CytoTRACE依赖两个python包scanoramaCT和numpy，所以R安装，library的时候，会有上面的提醒。
#安装conda什么的。windsows下设置python环境什么的太复杂，为了一个包折腾不值得，所以选择no。
#那么结果是这个包 多数据集分析的功能不能使用，不过没关系，一般我们也都是但数据集。


#当然，你也可以尝试安装python包，但是不一定成功
install.packages("reticulate")
library(reticulate)
conda_create("cytoTRACE",python_version = '3.7')
use_condaenv("cytoTRACE")
conda_install("cytoTRACE", "numpy")
conda_install("cytoTRACE", "scanoramaCT")

我们测试一下，用之前作用monocle的mouse data进行分析：CytoTRACE分析很简单，输入用单细胞count矩阵，可视化结合metadata。

mouse_data <- readRDS("D:/KS项目/公众号文章/monocle3拟时分析/mouse_data.rds")
exp1 <- as.matrix(mouse_data@assays$RNA@counts)
exp1 <- exp1[apply(exp1 > 0,1,sum) >= 5,]
results <- CytoTRACE(exp1,ncores = 1)
phenot <- mouse_data$celltype
phenot <- as.character(phenot)
names(phenot) <- rownames(mouse_data@meta.data)
emb <- mouse_data@reductions[["umap"]]@cell.embeddings
plotCytoTRACE(results, phenotype = phenot, emb = emb, outputDir = './')
plotCytoGenes(results, numOfGenes = 30, outputDir = './')

image.png

从结果上可以看到，PMN0和7的CytoTRACE评分最高，代表分化程度低，推断为这个细胞数据集中细胞的起点。CytoTRACE分析也返回了CytoTRACE评分文件，我们可以读取这个文件，然后自己做图（这里我标注的反了，不想改了）。

cyto <- read.table("CytoTRACE_plot_table.txt");head(cyto)
cutoff <- quantile(cyto$CytoTRACE, 0.75)
cyto$diff <- "low"
cyto[cyto$CytoTRACE > cutoff, ]$diff <- "high"
ggplot(cyto, aes(Component1, Component2, color = diff)) +
  geom_point(size = 1.5, alpha = 1.0)

image.png

当然了，很多时候，我们已经做完了monocle分析，例如已经完成了ordercell这一步骤，那么我们可以直接将CytoTRACE分析及可视化应用在monocle对象上，看起来更加直观。首先我们看看monocle默认的结果。可以看到，拟时轨迹与CytoTRACE是相反的，这和之前有人说的，一般情况下，monocle2的拟时轨迹是反的，这里不谋而合了。

mouse_monocle <- readRDS("D:/KS项目/公众号文章/monocle2拟时结果个性化作图/mouse_monocle.rds")
p1=plot_cell_trajectory(mouse_monocle,color_by='Pseudotime')
p2=plot_cell_trajectory(mouse_monocle,color_by='celltype')
p1+p2

image.png

后面的分析和之前在seurat对象上的一样。

monocle_meta <- data.frame(t(mouse_monocle@reducedDimS), 
                         mouse_monocle$Pseudotime, 
                         mouse_monocle$State, 
                         mouse_monocle$celltype)
colnames(monocle_meta) <- c("C1", "C2", "Pseudotime", "State", "celltype")

phenot1 <- monocle_meta$celltype
phenot1 <- as.character(phenot1)
names(phenot1) <- rownames(monocle_meta)
emb_monocle <- monocle_meta[,1:2]
plotCytoTRACE(results, phenotype = phenot, emb = emb_monocle, outputDir = './monocle/')

image.png

既然这个CytoTRACE包有很多高分文章使用，那么我们检测一下可信度。我们使用之前一篇Nature文章中的数据。这篇文章进行了monocle轨迹分析，但是没有使用CytoTRACE包，作者按照自己的生物学背景和实际数据推断的起点，所以我们用他的数据tset一下他的结果。一方面test Nature的结果，一方面test CytoTRACE包的结果。

#提取亚群
load("D:/KS项目/公众号文章/CytoTRACE包推断拟时起点辅助/sce.RData")
sce_sub <- sce1[,sce1$cluster %in% c("YSMP","GMP","Myeloblast","Monocyte")]
#直接用CytoTRACE分析看一下这几个细胞它推断的起点
sce_subExp <- as.matrix(sce_sub@assays$RNA@counts)
results <- CytoTRACE(sce_subExp,ncores = 1)
sce_subphenot <- sce_sub$cluster
sce_subphenot <- as.character(sce_subphenot)
names(sce_subphenot) <- rownames(sce_sub@meta.data)
sce_subemb <- sce_sub@reductions[["umap"]]@cell.embeddings

plotCytoTRACE(results, phenotype = sce_subphenot, emb = sce_subemb, outputDir = './test/')

image.png 看看原文的结果图：可以发现，结果是一致的！

image.png

（reference：Deciphering human macrophage development at single-cell resolution）

等等等等。。。。还没有结束（彩蛋）：我们之前发布完monocle2终结版之后呢，我测试过分析不会有问题了，可是很多小伙伴依然有问题，这里我们干脆测试一下，不论是ordercell还是BEAM都没有问题。前面的分析我们就直接利用函数一键跑完分析！


run_monocle2 <- function(
    inputobj,
    assay,
    slot
  ){

  requireNamespace("Seurat")
  data <- GetAssayData(inputobj, assay = assay, slot = slot)
  data <- data[rowSums(as.matrix(data)) != 0,]
  pd <- new("AnnotatedDataFrame", data = inputobj@meta.data)
  fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
  fd <- new("AnnotatedDataFrame", data = fData)
  monocds <- newCellDataSet(data,
                                phenoData = pd, 
                                featureData = fd,
                                expressionFamily=negbinomial.size())

  monocds <- estimateSizeFactors(monocds)
  monocds <- estimateDispersions(monocds)


  monocds <- detectGenes(monocds, min_expr = 0.1)

  print(head(fData(monocds)))
  expressed_genes <- row.names(subset(fData(monocds), num_cells_expressed >= 50)) # nolint
  monocds <- monocds[expressed_genes, ]

  disp_table <- dispersionTable(monocds)
  unsup_clustering_genes <- subset(
    disp_table, mean_expression >= 0.05 &
      dispersion_empirical >= 2 * dispersion_fit
  ) #
  monocds <- setOrderingFilter(monocds, unsup_clustering_genes$gene_id)

  monocds <- reduceDimension(
    monocds,
    max_components = 2,
    method = "DDRTree"
  )
  monocds <- orderCells(monocds)
  return(monocds)


}


cds = run_monocle2(sce_sub, assay = 'RNA', slot = 'counts')
plot_cell_trajectory(cds, color_by = 'cluster')
plot_cell_trajectory(cds, color_by = 'Pseudotime')

image.png

分析是没有任何问题的，结果因为与原文参数不一致，有些出入，总体是一致的。接下来，我们做一下分支的BEAM分析。

BEAM_res <- BEAM(cds, branch_point = 1, cores = 2)
BEAM_res <- BEAM_res[order(BEAM_res$qval),]
BEAM_res <- BEAM_res[,c("gene_short_name", "pval", "qval")]
plot_genes_branched_heatmap(cds[row.names(subset(BEAM_res, qval < 1e-20)),],
                            branch_point = 1,
                            num_clusters = 4,
                            cores = 1,
                            use_gene_short_name = T,
                            show_rownames = T)

image.png

总之，分析是不会有任何错误的，如果你安装了monocle2终结解决版，还是出错，那么你需要考虑你是否正确安装使用，文件是否正确，或者有些R包版本的问题。最后，觉得分享对您有用的，点个赞再走呗！码字不易！

scRNA-seq拟时分析 || Monocle2 踩坑教程
拟时（pseudotime）分析，又称细胞轨迹（cell trajectory）分析，通过拟时分析可以推断出发育过...
monocle2
拟时（pseudotime）分析，又称细胞轨迹（cell trajectory）分析，通过拟时分析可以推断出发育过...
NBIS系列单细胞转录组数据分析实战（八）：拟时序细胞轨迹推断
第八节：拟时序细胞轨迹推断在本节教程中，我们将学习如何通过拟时序分析推断细胞分化轨迹。slingshot包可以对...
CytoTRACE——拟时轨迹分析
什么是CytoTRACE CytoTRACE（使用基因计数和表达的细胞（Cyto）轨迹重建分析）是一种计算方法，可...
单细胞转录组之Scanpy - 轨迹推断/拟时序分析
什么是拟时序分析？拟时序（pseudotime）分析，又称细胞轨迹（cell trajectory）分析，通过拟时...
CytoTRACE 原理及代码解释
CytoTRACE是一款基于单细胞计数矩阵推测细胞间活性和细胞间相对分化状态的一款软件，这里我们结合实际代码例子，...
10X单细胞轨迹分析（拟时分析）之cytotrace
hello，大家好，这次我们来分享一下做轨迹分析的软件----CytoTRACE,文章在Single-cell t...
copyKAT推断单细胞转录组肿瘤细胞CNV
2021年一月初了新的推断肿瘤细胞CNV的方法：copyKAT。也是通过单细胞转录组数据来推断细胞的染色体倍数，进...
copyKAT从单细胞RNASeq数据推断人类肿瘤的基因组拷贝数
2021年一月初了新的推断肿瘤细胞CNV的方法：copyKAT。也是通过单细胞转录组数据来推断细胞的染色体倍数，进...
Cytotrace
library(reticulate)use_python("ifs/scratch/c2b2/../biosof...