单细胞测序数据高级分析--细胞通讯分析

作者: 七哥就是我 | 来源:发表于2022-09-18 17:24 被阅读0次

单细胞测序数据高级分析--细胞通讯分析
单细胞数据细胞通讯分析软件NATMI
单细胞转录组差异表达分析——SCDE
【文献简报】Cumulus 为大规模单细胞RNA测序提供云数据分
MacOS下安装velocyto.R
实用干货|Seurat-空间转录组数据基本分析教学
左手靶向基因，右手组学结合 — 多维发展的10x Genomic
单细胞多组学新品预告 | 单细胞ATAC+单细胞转录联合检测
SCS【17】从单细胞转录组推断肿瘤的CNV和亚克隆 (copy
单细胞测序分析R包Seurat质量控制小提琴图QC VlnPlo

多细胞生物的细胞与细胞之间往往会通过细胞因子和膜蛋白等进行通讯，从而调节生命活动，保证生命体高效、有序的运作。其中，受体-配体介导的细胞间通讯对协调发育、分化和疾病等多种生物学过程至关重要。细胞通讯分析主要通过统计不同细胞类型中受体和配体的表达及配对情况，结合分子信息数据库推断不同细胞之间的相互作用。因此，利用单细胞转录组数据分析的细胞通讯，仅限于蛋白质配体-受体复合物介导的细胞间通讯，其分析的基础是基因表达数据和配体-受体数据库。例如转录组数据表明A细胞和B细胞分别表达了基因α和β，通过数据库查询α和β是配体-受体关系，则认为A-B通过α-β途径进行了通讯。

一、细胞通讯分析常用方法

基于单细胞数据进行的细胞通讯分析往往涉及到表达量矩阵，受体-配体表达，受体-配体互作强度等信息。常见的单细胞转录组细胞通讯分析工具有CellPhoneDB、celltalker和iTALK等，最权威的可能属CellPhoneDB，比较常用。接下来，本期内容将针对这三种常见的细胞通讯工具做一简单介绍。

1、CellPhoneDB

CellPhoneDB主要通过一种细胞类型的受体和另一种细胞类型的配体的表达信息，得到不同细胞类型之间的相互作用。CellPhoneDB独特之处是不仅包含了数据库注释的受体和配体，还提供了人工注释的参与细胞通讯的蛋白质家族，提供受体和配体的亚基结构，这是大多数据库和研究没有涉及的，这点对于准确研究多亚基复合物介导的细胞通讯很关键，可以对细胞间的通讯分子进行全面、系统的分析。CellPhoneDB数据库概况如下图所示：

图1 CellPhoneDB受体配体库，来自该数据库的对应文献

CellPhoneDB具体分析流程如图：

图2 CellPhoneDB工作流程图

2、Celltalker

Celltalker的分析采纳的数据来自Ramilowski et al (Nature Communications, 2015)的研究成果，并且允许用户自己添加配对信息。celltalker分析细胞通讯的特点是更倾向于分析样本之间具有差异的细胞通讯关系，其构建celltalker对象的函数也要求输入分组信息。为了保证分析的可靠性，它要求一个分组要有几个重复样本。
celltalker认定细胞进行通讯的前提是配体和受体的表达值在通讯的细胞之间具有一致性。

celltalker包的数据提取和图形调整相对复杂，有耐心的可以细调，本人不建议使用此包！

library(Seurat)
library(tidyverse)
library(celltalker)
scRNA <- readRDS("scRNA.rds")

##调整scRNA的meta.data，方便满足celltalker的数据要求
cell_meta = scRNA@meta.data[,1:9]
names(cell_meta)[which(names(cell_meta)=='orig.ident')] <- "sample.id"
names(cell_meta)[which(names(cell_meta)=='celltype_Monaco')] <- "cell.type"
cell_meta$sample.type <- cell_meta$sample.id
cell_meta[grep("^HNC[0-9]+PBMC", cell_meta$sample.type), "sample.type"] <- "HNC_PBMC"
cell_meta[grep("^HNC[0-9]+TIL", cell_meta$sample.type), "sample.type"] <- "HNC_TIL"
cell_meta[grep("^PBMC", cell_meta$sample.type), "sample.type"] <- "PBMC"
cell_meta[grep("^Tonsil", cell_meta$sample.type), "sample.type"] <- "Tonsil"
scRNA@meta.data <- cell_meta

##提取数据子集
cellsub = rownames(subset(cell_meta, sample.type=="HNC_TIL"|sample.type=="Tonsil"))
scRNA <- scRNA[,cellsub]

##鉴定scRNA中存在的配体受体
ligs <- as.character(unique(ramilowski_pairs$ligand))
recs <- as.character(unique(ramilowski_pairs$receptor))
ligs.present <- rownames(scRNA)[rownames(scRNA) %in% ligs]
recs.present <- rownames(scRNA)[rownames(scRNA) %in% recs]
genes.to.use <- union(ligs.present,recs.present)

##鉴定样本分组之间差异表达的配体受体
Idents(scRNA) <- "sample.type" 
markers <- FindAllMarkers(scRNA,assay="RNA",features=genes.to.use,only.pos=TRUE)
ligs.recs.use <- unique(markers$gene)

##保留差异表达的配体-受体列表
interactions.forward1 <- ramilowski_pairs[as.character(ramilowski_pairs$ligand) %in% ligs.recs.use,]
interactions.forward2 <- ramilowski_pairs[as.character(ramilowski_pairs$receptor) %in% ligs.recs.use,]
interact.for <- rbind(interactions.forward1,interactions.forward2)

##准备celltalker的输入文件
expr.mat <- GetAssayData(scRNA, slot="counts")
defined.clusters <- scRNA@meta.data$cell.type
defined.groups <- scRNA@meta.data$sample.type
defined.replicates <- scRNA@meta.data$sample.id

##重构数据，为每个样本分配相应的表达矩阵
reshaped.matrices <- reshape_matrices(count.matrix=expr.mat, clusters=defined.clusters,
    groups=defined.groups, replicates=defined.replicates, ligands.and.receptors=interact.for)

##分析表达一致性的配体和受体。
# cells.reqd=10：每个分组中至少有10个细胞表达了配体/受体
# freq.pos.reqd=0.5：至少有50％的样本表达的配体/受体
consistent.lig.recs <- create_lig_rec_tib(exp.tib=reshaped.matrices,
                                          clusters=defined.clusters,
                                          groups=defined.groups,
                                          replicates=defined.replicates,
                                          cells.reqd=10,
                                          freq.pos.reqd=0.5,
                                          ligands.and.receptors=interact.for)

##生成Celltalker对象，并绘制circos圈图
# freq.group.in.cluster: 只对细胞数>总细胞数5%的细胞类型分析
put.int<-putative_interactions(ligand.receptor.tibble=consistent.lig.recs,
                               clusters=defined.clusters,
                               groups=defined.groups,
                               freq.group.in.cluster=0.05,
                               ligands.and.receptors=interact.for)


##鉴定分组间特异出现的配体/受体并作图
unique.ints <- unique_interactions(put.int, group1="HNC_TIL", group2="Tonsil", interact.for)
#HNC_TIL组作图
HNC_TIL <- pull(unique.ints[1,2])[[1]]
circos.HNC_TIL <- pull(put.int[1,2])[[1]][HNC_TIL]
par(MAR=c(1,1,1,1))
circos_plot(interactions=circos.HNC_TIL, clusters=defined.clusters)
#Tonsil组作图
Tonsil <- pull(unique.ints[2,2])[[1]]
circos.Tonsil <- pull(put.int[2,2])[[1]][Tonsil]
circos_plot(interactions=circos.Tonsil, clusters=defined.clusters)

3、iTALK

iTALK这个包的应用比较简单，可以自定义定制的配体-受体数据库。默认数据库分析物种为人，如果我们做的事其他物种，可以匹配人的同源基因。它的优点还是集成了多种差异分析和可视化方法。将配体-受体注释为细胞因子、生长因子、免疫检查点和其他类。配体-受体分析的中间数据和最终结果都可以轻松导出，我没有泡跑自己的数据，也不常用这个包，所以以官方教程为主。感兴趣的可以去官方查看教程教程：https://github.com/Coolgenome/iTALK/tree/master/example

图3 iTALK工作流程，来自作者原图


# This example data is from 10x pbmc dataset. Samples are randomly selected from each cell type. And groups are randomly assigned to each sample to make the illustration.

library(iTALK)

# ====================准备数据===================
data<-read.table('example_data.txt',sep='\t',header=T,stringsAsFactors = F)

## highly expressed ligand-receptor pairs

# find top 50 percent highly expressed genes
highly_exprs_genes<-rawParse(data,top_genes=50,stats='mean')
# find the ligand-receptor pairs from highly expressed genes
comm_list<-c('growth factor','other','cytokine','checkpoint')
cell_col<-structure(c('#4a84ad','#4a1dc6','#e874bf','#b79eed', '#ff636b', '#52c63b','#9ef49a'),names=unique(data$cell_type))
par(mfrow=c(1,2))
res<-NULL
for(comm_type in comm_list){
    res_cat<-FindLR(highly_exprs_genes,datatype='mean count',comm_type=comm_type)
    res_cat<-res_cat[order(res_cat$cell_from_mean_exprs*res_cat$cell_to_mean_exprs,decreasing=T),]
    #plot by ligand category
    #overall network plot
    NetView(res_cat,col=cell_col,vertex.label.cex=1,arrow.width=1,edge.max.width=5)
    #top 20 ligand-receptor pairs
    LRPlot(res_cat[1:20,],datatype='mean count',
           cell_col=cell_col,
           link.arr.lwd=res_cat$cell_from_mean_exprs[1:20],
           link.arr.width=res_cat$cell_to_mean_exprs[1:20])
    title(comm_type)
    res<-rbind(res,res_cat)
}
res<-res[order(res$cell_from_mean_exprs*res$cell_to_mean_exprs,decreasing=T),][1:20,]
NetView(res,col=cell_col,vertex.label.cex=1,arrow.width=1,edge.max.width=5)
LRPlot(res[1:20,],datatype='mean count',cell_col=cell_col,link.arr.lwd=res$cell_from_mean_exprs[1:20],link.arr.width=res$cell_to_mean_exprs[1:20])

## significant ligand-receptor pairs between compare groups

# randomly assign the compare group to each sample
data<-data %>% mutate(compare_group=sample(2,nrow(data),replace=TRUE))
# find DEGenes of regulatory T cells and NK cells between these 2 groups
deg_t<-DEG(data %>% filter(cell_type=='regulatory_t'),method='Wilcox',contrast=c(2,1))
deg_nk<-DEG(data %>% filter(cell_type=='cd56_nk'),method='Wilcox',contrast=c(2,1))
# find significant ligand-receptor pairs and do the plotting
par(mfrow=c(1,2))
res<-NULL
for(comm_type in comm_list){
    res_cat<-FindLR(deg_t,deg_nk,datatype='DEG',comm_type=comm_type)
    res_cat<-res_cat[order(res_cat$cell_from_logFC*res_cat$cell_to_logFC,decreasing=T),]
    #plot by ligand category
    if(nrow(res_cat)==0){
        next
    }else if(nrow(res_cat>=20)){
        LRPlot(res_cat[1:20,],datatype='DEG',cell_col=cell_col,link.arr.lwd=res_cat$cell_from_logFC[1:20],link.arr.width=res_cat$cell_to_logFC[1:20])
    }else{
        LRPlot(res_cat,datatype='DEG',cell_col=cell_col,link.arr.lwd=res_cat$cell_from_logFC,link.arr.width=res_cat$cell_to_logFC)
    }
    NetView(res_cat,col=cell_col,vertex.label.cex=1,arrow.width=1,edge.max.width=5)
    title(comm_type)
    res<-rbind(res,res_cat)
}
if(is.null(res)){
    print('No significant pairs found')
}else if(nrow(res)>=20){
    res<-res[order(res$cell_from_logFC*res$cell_to_logFC,decreasing=T),][1:20,]
    NetView(res,col=cell_col,vertex.label.cex=1,arrow.width=1,edge.max.width=5)
    LRPlot(res[1:20,],datatype='DEG',cell_col=cell_col,link.arr.lwd=res$cell_from_logFC[1:20],link.arr.width=res$cell_to_logFC[1:20])
}else{
    NetView(res,col=cell_col,vertex.label.cex=1,arrow.width=1,edge.max.width=5)
    LRPlot(res,datatype='DEG',cell_col=cell_col,link.arr.lwd=res$cell_from_logFC,link.arr.width=res$cell_to_logFC)
}
# I just randomly assigned the compare group to samples which has no biological difference for showing how to use the package.
# So there should be no significant genes to be expected.

图4 结果可视化，来自文章原图

参考文献
[1] Efremova, M., Vento-Tormo, M., Teichmann, S. A., & Vento-Tormo, R. (2020). CellPhoneDB: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nature Protocols. doi:10.1038/s41596-020-0292-x
[2] Ramilowski, J. A., Goldberg, T., Harshbarger, J., Kloppmann, E., Lizio, M., Satagopam, V. P., … Forrest, A. R. R. (2015). A draft network of ligand–receptor-mediated multicellular signalling in human. Nature Communications, 6(1).doi:10.1038/ncomms8866
[3]iTALK: an R Package to Characterize and Illustrate Intercellular Communication

单细胞测序数据高级分析--细胞通讯分析
多细胞生物的细胞与细胞之间往往会通过细胞因子和膜蛋白等进行通讯，从而调节生命活动，保证生命体高效、有序的运作。其中...
单细胞数据细胞通讯分析软件NATMI
单细胞数据分析中，细胞通讯是最重要的一环当然，已经有了很多的软件用于单细胞数据的通讯分析，比如cellphoneD...
单细胞转录组差异表达分析——SCDE
哎呀，终究还是绕不过差异表达分析。单细胞测序，测得还是转录组数据，和bluk data的差别就是单细胞测序存在许多...
【文献简报】Cumulus 为大规模单细胞RNA测序提供云数据分
单细胞数量越来越多，上云分析成了必不可少的道路，单细胞大数据分析是未来的趋势，大规模平行的单细胞和单核RNA测序为...
MacOS下安装velocyto.R
拟时序分析是单细胞测序高级分析之一，其中引用量最多的一是Monocle，其次就是velocyto。velocyto...
实用干货|Seurat-空间转录组数据基本分析教学
上一篇文章对用Seurat分析单细胞数据进行了比较详细的介绍（实用干货|Seurat-单细胞转录组测序分析R包教学...
左手靶向基因，右手组学结合 — 多维发展的10x Genomic
单细胞转录组测序允许在单细胞水平直接分析基因表达，并在单细胞水平分析细胞内群体异质性、定义细胞类型、细胞状态和细胞...
单细胞多组学新品预告 | 单细胞ATAC+单细胞转录联合检测
随着单细胞测序技术的发展，单细胞转录组测序以及单细胞表观基因组测序，极大的提高了研究者们从异质性样本中分析细胞类型...
SCS【17】从单细胞转录组推断肿瘤的CNV和亚克隆 (copy
该文章使用桓峰基因公众号里面单细胞分析教程即可实现，有需要类似思路的老师可以联系我们！单细胞测序数据分析教程整理如...
单细胞测序分析R包Seurat质量控制小提琴图QC VlnPlo
单细胞测序分析预处理结束，拿到原始count或者TPM文件后，我们到了下游分析处理环节。 Seurat包是单细胞入...