Overview
由于在单细胞数据(如RNA-seq、ATAC-seq)中观察到的稀疏性,细胞特征(如基因、峰)的可视化经常受到影响和不清楚,特别是当它与聚类重叠以注释细胞类型时。Nebulosa是一个基于核密度估计的R软件包,用于可视化单个细胞的数据。它的目的是通过合并单元之间的相似性来从丢失的特征中恢复信号,从而允许单元特征的“卷积”。
import some necessary packages
library("Nebulosa")
library("Seurat")
library("BiocFileCache")
Data pre-processing
bfc <- BiocFileCache(ask = FALSE)
data_file <- bfcrpath(bfc, file.path(
"https://s3-us-west-2.amazonaws.com/10x.files/samples/cell",
"pbmc3k",
"pbmc3k_filtered_gene_bc_matrices.tar.gz"
))
untar(data_file, exdir = tempdir())
# read the gene expression matrix
data <- Read10X(data.dir = file.path(tempdir(),
"filtered_gene_bc_matrices",
"hg19"
))
# create a Seurat object
pbmc <- CreateSeuratObject(
counts = data,
project = "pbmc3k",
min.cells = 3,
min.features = 200
)
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA < 2500 & percent.mt < 5)
Data normalization, Dimensionality reduction, and Clustering
pbmc <- SCTransform(pbmc, verbose = FALSE)
pbmc <- RunPCA(pbmc)
pbmc <- RunUMAP(pbmc, dims = 1:30)
pbmc <- FindNeighbors(pbmc, dims = 1:30)
pbmc <- FindClusters(pbmc)
Visualize data with Nebulosa 关键函数为plot_density
plot_density(pbmc, "CD4") ==> Figure 1
FeaturePlot(pbmc, "CD4") ==> Figure 2
FeaturePlot(pbmc, "CD4", order = TRUE) ==> Figure 3
DimPlot(pbmc, label = TRUE, repel = TRUE) ==> Figure 4
plot_density(pbmc, "CD3D") ==> Figure 5
![](https://img.haomeiwen.com/i24212842/e1feef17d3a76d2a.png)
![](https://img.haomeiwen.com/i24212842/41486f6d978d621a.png)
![](https://img.haomeiwen.com/i24212842/beff2e182e3afeed.png)
![](https://img.haomeiwen.com/i24212842/b1b84bccb2d8b8f2.png)
![](https://img.haomeiwen.com/i24212842/53c1d2a679fe74fd.png)
根据上面结果 We can now easily identify that clusters 0 and 2 correspond to CD4+ T cells if we plot CD3D too.
Multi-feature visualization 主要参数joint
p3 <- plot_density(pbmc, c("CD8A", "CCR7"))
p3 + plot_layout(ncol = 1)
p4 <- plot_density(pbmc, c("CD8A", "CCR7"), joint = TRUE)
p4 + plot_layout(ncol = 1)
p_list <- plot_density(pbmc, c("CD8A", "CCR7"), joint = TRUE, combine = FALSE)
p_list[[length(p_list)]]
![](https://img.haomeiwen.com/i24212842/07e3f8c7a5a56f6e.png)
![](https://img.haomeiwen.com/i24212842/e719fc4cdfc329af.png)
![](https://img.haomeiwen.com/i24212842/35ace3254d9f858c.png)
根据上面结果 When compared to the clustering results, we can easily identify that Naive CD8+ T cells correspond to cluster 8.
Conclusions
总之,星云图(Nebulosa density plots)可以用来恢复基因缺失的信号,并改善其在二维空间的可视化效果。我们建议使用Nebulosa,特别是对于dropped-out 的基因。对于表达良好的基因,直接可视化的基因表达可能更可取。我们鼓励用户使用Nebulosa以及来自Seurat和Bioconductor环境的核心可视化方法以及其他可视化方法,以便对他们的数据得出更明智的结论。
网友评论