CytoTRACE2：单细胞转录组细胞分化潜能推断-拟时起点参考

作者: KS科研分享与服务 | 来源:发表于2024-12-17 11:30 被阅读0次

单细胞转录组高级分析四：scRNA数据推断CNV
PySCENIC（一）：python版单细胞转录组转录因子分析
人骨髓和脐血单细胞转录组揭示红细胞连续分化的调控因子
copyKAT推断单细胞转录组肿瘤细胞CNV
copyKAT从单细胞RNASeq数据推断人类肿瘤的基因组拷贝数
单细胞谱系分析重建人类肺末梢祖细胞分化过程
单细胞测序技术
单细胞转录组测序知识一隅
10xGenomics单细胞转录组亚群细分策略
Hemberg-lab单细胞转录组数据分析（八） - Scate

参考我们之前发布过CytoTRACE推断细胞发育潜能（CytoTRACE：推断拟时细胞起点辅助（结尾有彩蛋）），还可以作为拟时的起点参考，目前CytoTRACE2出来了，年初的事情，CytoTRACE2不是CytoTRACE包的升级版。那么写这个内容一方面是与时俱进，CytoTRACE2优化了算法，我们可以应用。另一方面也是小伙伴使用CytoTRACE2的时候出现了错误，我们尝试一下，是否也会出现，如果有解决它。此外，CytoTRACE2具有R语言和python版，两者是分开的，R语言当然大多数没有问题，有些小伙伴也有python需求，所以我们这个帖子介绍两种版本的使用。这个方法支持小鼠和人的数据分析。关于CytoTRACE2可以详细阅读它的文章：Minji Kang, Jose Juan Almagro Armenteros, Gunsagar S. Gulati*, Rachel Gleyzer, Susanna Avagyan, Erin L. Brown, Wubing Zhang, Abul Usmani, Noah Earland, Zhenqin Wu, James Zou, Ryan C. Fields, David Y. Chen, Aadel A. Chaudhuri, Aaron M. Newman.bioRxiv 2024.03.19.585637; doi: https://doi.org/10.1101/2024.03.19.585637 (preprint)

The predicted potency scores additionally provide a continuous measure of developmental potential, ranging from 0 (differentiated) to 1 (totipotent).
Underlying this method is a novel, interpretable deep learning framework trained and validated across 31 human and mouse scRNA-seq datasets encompassing 28 tissue types, collectively spanning the developmental spectrum.
This framework learns multivariate gene expression programs for each potency category and calibrates outputs across the full range of cellular ontogeny, facilitating direct cross-dataset comparison of developmental potential in an absolute space.

R语言版：CytoTRACE2官网：https://github.com/digitalcytometry/cytotrace2

加载数据并安装R包，数据还是使用的之前的一篇Nature的，可以作为参考：


###加载数据及安装包
library(Seurat)
DimPlot(sce1, label = T)
sce_sub <- sce1[,sce1$cluster %in% c("YSMP","GMP","Myeloblast","Monocyte")]

devtools::install_github("digitalcytometry/cytotrace2", subdir = "cytotrace2_r")
library(CytoTRACE2)

CytoTRACE2的运行是很简单的，它的input可以是表达矩阵，也可以直接是seurat object。这里我们做了一个对比，使用counts和data得到的结果是一样的。

#data running-主要函数cytotrace2
cytotrace2_sce <- cytotrace2(sce_sub, #seurat对象
                             is_seurat = TRUE, 
                             slot_type = "counts", #counts和data都可以
                             species = 'human')#物种要选择，默认是小鼠


class(cytotrace2_sce)
# [1] "Seurat"
# attr(,"package")
# [1] "SeuratObject"



# cytotrace2_res <- cytotrace2(sce_sub@assays$RNA$data, #seurat对象
#                              species = 'human')#物种要选择，默认是小鼠
# 
# class(cytotrace2_res)
# [1] "data.frame"

结果可视化；


annotation <- data.frame(phenotype = sce_sub@meta.data$cluster) %>% 
  set_rownames(., colnames(sce_sub))

# plotting-一次性生成多个图，然后储存在一个list，用$查看即可
plots <- plotData(cytotrace2_result = cytotrace2_sce, 
                  annotation = annotation, 
                  is_seurat = TRUE)


#如果这些图您需要放在文章中，需要修饰也是可以的
#因为是基于ggplot的作图，所以修饰就很简单了
#比如我们修饰一下主题
library(ggplot2)
for(i in 1:(length(plots)-1)) {


  plots[[i]] <- plots[[i]]+theme_bw()


}



#可以一个个查看图并保存
# #p1
# plots$CytoTRACE2_UMAP
# #p2
# plots$CytoTRACE2_Potency_UMAP
# #p3
# plots$CytoTRACE2_Relative_UMAP
# #p4
# plots$Phenotype_UMAP
# #p5
# plots$CytoTRACE2_Boxplot_byPheno

#我们这里为了方便展示，组合展示
library(cowplot)
plot_grid(plots[[1]],plots[[3]],plots[[4]],
          plots[[5]],ncol=2)#ncol=4表示图片排为几列

得到的结论和CytoTRACE1是一致的。从图1到图5，可以看出celltype的分化潜能，总之分析和可视化都特别的方便简单！接下来看看python版本的！

Python版：CytoTRACE2官网：https://github.com/digitalcytometry/cytotrace2/tree/main/cytotrace2_python

首先还是安装CytoTRACE2包，终端安装即可。安装比较费时间，大概得30min。


cd data_analysis/cytotrace2_py/
git clone https://github.com/digitalcytometry/cytotrace2
cd cytotrace2/cytotrace2_python
conda env create -f environment_py.yml
conda activate cytotrace2-py
pip install .

python版本的CytoTRACE2的输入文件需要gene expression matrix以及celltype annotation data。如果是Seurat object，这些文件在R中准备即可：

gene_exp <- as.matrix(GetAssayData(sce_sub, layer = "counts"))
write.table(gene_exp, file = "gene_exp.txt", sep = '\t',quote=F)

cell_anno <- data.frame(cellid = rownames(sce_sub@meta.data),
                        celltype = sce_sub@meta.data$cluster)
write.table(cell_anno, file = "cell_anno.txt", sep = '\t',quote=F, row.names = F)

如果你的单细胞文件是python结果，scanpy准备这些文件，因为我们没有这样的数据，所以我们将演示的seurat obj转化为h5ad，演示数据获取：


getwd()
setwd("/home/tq_ziv/data_analysis/cytotrace2_py/")


# sce_sub <- sce1[,sce1$cluster %in% c("YSMP","GMP","Myeloblast","Monocyte")]
# save(sce_sub, file = "sce_sub.RData")

library(sceasy)
library(reticulate)
use_condaenv('sceasy')
loompy <- reticulate::import('loompy')
sceasy::convertFormat(sce_sub, from="seurat", to="anndata", outFile='sce_sub.h5ad')

import scanpy as sc
adata=sc.read_h5ad("./sce_sub.h5ad")
expression_matrix = pd.DataFrame(adata.to_df().T) #需要转置一下
expression_matrix.head()
expression_matrix.to_csv('expression_matrix.txt',sep="\t")
cell_annotations  = pd.DataFrame(data=adata.obs["cluster"])
cell_annotations
cell_annotations.to_csv('cell_annotations.txt',sep="\t")

运行方式也有两种，一种是终端运行，方式类似于pyscenic：

#直接终端运行
cytotrace2 --input-path gene_exp.txt --annotation-path cell_anno.txt --species human

另外一种python运行，调用函数即可：


#python中运行
from cytotrace2_py.cytotrace2_py import *
exp_path = "./expression_matrix.txt"
annotation_path = "./cell_annotations.txt"
species = "human"

results =  cytotrace2(exp_path,
                     annotation_path=annotation_path,
                     species=species)

输出结果和R是一样的，也是5个图。总体而言，还是R使用着得心应手，很舒服。如果觉得python版本太过于麻烦或者可能出现一些位置错误，建议将数据转化为seurat或者得到矩阵和注释文件，使用R版进行分析！

觉得我们分享有些用的，点个赞再走呗！