10xGenomics单细胞转录组
10xGenomics单细胞转录组分析的核心就是聚类,但是单细胞转录组分析的聚类到目前为止还存在很多困难和挑战,具体可以参考文献Challenges in unsupervised clustering of single-cell RNA-seq data。这里介绍的聚类、亚群再分析使用的10xGenomics单细胞转录组最常用的seurat软件包。
Hi,
Determining the "right" set of clusters for a single-cell dataset is a challenging problem and often requires interpretation from a biological viewpoint. As mentioned in #819, this article provides a good review on single cell clustering.
satijalab Further subdivisions within clusters #1192
10xGenomics单细胞转录组亚群细分策略
目前较为常见的方法有两种策略:
- 调整亚群分辨率
- 亚群细胞提取出来,重新从头进行聚类
调整亚群分辨率
其实调整亚群聚类分辨率来实现亚群细分,官方Guided Clustering Tutorial手册有相关说明。
Further subdivisions within cell types
If you perturb some of our parameter choices above (for example, setting resolution=0.8 or changing the number of PCs), you might see the CD4 T cells subdivide into two groups. You can explore this subdivision to find markers separating the two T cell subsets. However, before reclustering (which will overwrite object@ident), we can stash our renamed identities to be easily recovered later.
# First lets stash our identities for later
pbmc <- StashIdent(object = pbmc, save.name = "ClusterNames_0.6")
# Note that if you set save.snn=T above, you don't need to recalculate the
# SNN, and can simply put: pbmc <- FindClusters(pbmc,resolution = 0.8)
pbmc <- FindClusters(object = pbmc, reduction.type = "pca", dims.use = 1:10,
resolution = 0.8, print.output = FALSE)
## Warning in BuildSNN(object = object, genes.use = genes.use, reduction.type
## = reduction.type, : Build parameters exactly match those of already
## computed and stored SNN. To force recalculation, set force.recalc to TRUE.
# Demonstration of how to plot two tSNE plots side by side, and how to color
# points based on different criteria
plot1 <- TSNEPlot(object = pbmc, do.return = TRUE, no.legend = TRUE, do.label = TRUE)
plot2 <- TSNEPlot(object = pbmc, do.return = TRUE, group.by = "ClusterNames_0.6",
no.legend = TRUE, do.label = TRUE)
plot_grid(plot1, plot2)
调整分辨率实现亚群细分
亚群细胞提取出来,重新从头进行聚类
这种方式就是要根据表达矩阵和聚类文件,把某一个聚类的所有细胞表达矩阵提取出来,然后重头分析一遍,提取表达矩阵需要两个文件:
- 细胞以及对应聚类编号csv文件]
- 所有细胞表达矩阵文件
细胞以及对应聚类编号csv文件:
一共两列,第一列为细胞barcode,第二列为聚类编号。
细胞以及对应聚类编号csv文件
表达矩阵文件
第一行为表头,第一列为基因名称,除了第一列以外,其他的每一列为一个细胞barcode。每一行为某个基因在所有细胞总的表达情况,对应每个数字为该基因在该细胞中的表达量。
表达矩阵文件
具体提取脚本,会有另外文章说明,这里不再概述。
提取表达量文件后,重新按照pipeline进行分析,得到聚类结果等。
重新从头进行聚类结果
特别说明:
上述主要是针对单个样品的亚群细分分析,如果是有比较差异分析的话,还是需要提前表达矩阵或者S4对象,重新聚类、差异分析,这里是官方GitHub回复意见:
You can certainly subset your data, and recalculate Variable Genes, scale, run PCA, and cluster.
Note that you can set ident.use = c(0, 1) to subset two clusters.
satijalab Re-clustering of given clusters #752
网友评论