Compiled: March 18, 2021
原文链接:https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis.html
这篇教程主要介绍了多模态单细胞数据的WNN分析工作框架,分为以下三个步骤:
- 每一个模态数据的单独预处理和降维
- 学习细胞特异的模态权重,构建WNN图用于整合多个模态。
- 对WNN图的下游分析(如可视化,聚类)
我们将WNN分析应用于两种单细胞多模技术:CITE-seq和10x multiome。我们基于两种形态来定义细胞状态,而不是基于任何一种个体形态。
note:此教程需要基于Seurat4.0以上的版本
1.1 数据介绍
数据来自:Stuart, Butler et al, Cell 2019,来自于骨髓的30,672个单细胞以及配套的25个抗体。这个数据包含两个assays,RNA和antibody-derived tags (ADT)。
数据封装在bmcite包中,最好是晚上的时间段去install这个data,网速会比较快,然后保存。
# 加载包
library(Seurat)
library(SeuratData)
library(cowplot)
library(dplyr)
InstallData("bmcite")
bm <- LoadData(ds = "bmcite")
saveRDS(bm,file = "bm.rds") # 保存
1.2 数据预处理和降维
首先,我们分别对两个assays进行预处理和降维。这里我们使用标准的标准化,当然你也可以使用SCT转换。
DefaultAssay(bm) <- 'RNA'
bm <- NormalizeData(bm) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA()
DefaultAssay(bm) <- 'ADT'
# we will use all ADT features for dimensional reduction
# we set a dimensional reduction name to avoid overwriting the
VariableFeatures(bm) <- rownames(bm[["ADT"]])
bm <- NormalizeData(bm, normalization.method = 'CLR', margin = 2) %>%
ScaleData() %>% RunPCA(reduction.name = 'apca')
基于RNA和蛋白权重加合,我们计算每一个细胞的最近邻居。在另一个功能中,我们计算细胞特异的模块权重和多模态邻居,在这个数据集中大约耗时两分钟。随后定义每一个模块的维度(与单细胞数据分析中的PCs数类似)
# Identify multimodal neighbors. These will be stored in the neighbors slot,
# and can be accessed using bm[['weighted.nn']]
# The WNN graph can be accessed at bm[["wknn"]],
# and the SNN graph used for clustering at bm[["wsnn"]]
# Cell-specific modality weights can be accessed at bm$RNA.weight
bm <- FindMultiModalNeighbors(
bm, reduction.list = list("pca", "apca"),
dims.list = list(1:30, 1:18), modality.weight.name = "RNA.weight"
)
1.3 下游分析:聚类和可视化
我们现在可以使用这些结果进行下游分析了,比如可视化和聚类。比如,我么可以基于RNA和蛋白的权重加合数据来进行UMAP可视化,聚类,细胞注释。
bm <- RunUMAP(bm, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
bm <- FindClusters(bm, graph.name = "wsnn", algorithm = 3, resolution = 2, verbose = FALSE)
# 可视化
p1 <- DimPlot(bm, reduction = 'wnn.umap', label = TRUE, repel = TRUE, label.size = 2.5) + NoLegend()
p2 <- DimPlot(bm, reduction = 'wnn.umap', group.by = 'celltype.l2', label = TRUE, repel = TRUE, label.size = 2.5) + NoLegend()
p1 + p2

我们也可以分别单独只基于RNA或蛋白的数据进行UMAP然后比较。
bm <- RunUMAP(bm, reduction = 'pca', dims = 1:30, assay = 'RNA', reduction.name = 'rna.umap', reduction.key = 'rnaUMAP_')
bm <- RunUMAP(bm, reduction = 'apca', dims = 1:18, assay = 'ADT', reduction.name = 'adt.umap', reduction.key = 'adtUMAP_')
p3 <- DimPlot(bm, reduction = 'rna.umap', group.by = 'celltype.l2', label = TRUE,
repel = TRUE, label.size = 2.5) + NoLegend()
p4 <- DimPlot(bm, reduction = 'adt.umap', group.by = 'celltype.l2', label = TRUE,
repel = TRUE, label.size = 2.5) + NoLegend()
p3 + p4

我们可以基于多模态的UMAP进行经典基因和蛋白marker的可视化,为细胞注释提供辅助。
p5 <- FeaturePlot(bm, features = c("adt_CD45RA","adt_CD16","adt_CD161"),
reduction = 'wnn.umap', max.cutoff = 2,
cols = c("lightgrey","darkgreen"), ncol = 3)
p6 <- FeaturePlot(bm, features = c("rna_TRDC","rna_MPO","rna_AVP"),
reduction = 'wnn.umap', max.cutoff = 3, ncol = 3)
p5 / p6

最后,我们可以对模态权重(modality weights )进行可视化。RNA权重最高的群代表祖细胞(progenitor cells),蛋白权重最高的群代表T细胞。这与我们的生物学预期是一致的,因为抗体组不包含可以区分不同祖群体的标记。
VlnPlot(bm, features = "RNA.weight", group.by = 'celltype.l2', sort = TRUE, pt.size = 0.1) + NoLegend()

网友评论