10xGenomics单细胞转录组

10xGenomics单细胞转录组分析的核心就是聚类，但是单细胞转录组分析的聚类到目前为止还存在很多困难和挑战，具体可以参考文献Challenges in unsupervised clustering of single-cell RNA-seq data。这里介绍的聚类、亚群再分析使用的10xGenomics单细胞转录组最常用的seurat软件包。

Hi,
Determining the "right" set of clusters for a single-cell dataset is a challenging problem and often requires interpretation from a biological viewpoint. As mentioned in #819, this article provides a good review on single cell clustering.
satijalab Further subdivisions within clusters #1192

10xGenomics单细胞转录组亚群细分策略

目前较为常见的方法有两种策略：

调整亚群分辨率

亚群细胞提取出来，重新从头进行聚类

调整亚群分辨率

其实调整亚群聚类分辨率来实现亚群细分，官方Guided Clustering Tutorial手册有相关说明。

Further subdivisions within cell types
If you perturb some of our parameter choices above (for example, setting resolution=0.8 or changing the number of PCs), you might see the CD4 T cells subdivide into two groups. You can explore this subdivision to find markers separating the two T cell subsets. However, before reclustering (which will overwrite object@ident), we can stash our renamed identities to be easily recovered later.

# First lets stash our identities for later
pbmc <- StashIdent(object = pbmc, save.name = "ClusterNames_0.6")

# Note that if you set save.snn=T above, you don't need to recalculate the
# SNN, and can simply put: pbmc <- FindClusters(pbmc,resolution = 0.8)
pbmc <- FindClusters(object = pbmc, reduction.type = "pca", dims.use = 1:10, 
    resolution = 0.8, print.output = FALSE)
## Warning in BuildSNN(object = object, genes.use = genes.use, reduction.type
## = reduction.type, : Build parameters exactly match those of already
## computed and stored SNN. To force recalculation, set force.recalc to TRUE.
# Demonstration of how to plot two tSNE plots side by side, and how to color
# points based on different criteria
plot1 <- TSNEPlot(object = pbmc, do.return = TRUE, no.legend = TRUE, do.label = TRUE)
plot2 <- TSNEPlot(object = pbmc, do.return = TRUE, group.by = "ClusterNames_0.6", 
    no.legend = TRUE, do.label = TRUE)
plot_grid(plot1, plot2)

调整分辨率实现亚群细分

亚群细胞提取出来，重新从头进行聚类

这种方式就是要根据表达矩阵和聚类文件，把某一个聚类的所有细胞表达矩阵提取出来，然后重头分析一遍，提取表达矩阵需要两个文件：

细胞以及对应聚类编号csv文件]

所有细胞表达矩阵文件

细胞以及对应聚类编号csv文件：

一共两列，第一列为细胞barcode，第二列为聚类编号。

细胞以及对应聚类编号csv文件

表达矩阵文件

第一行为表头，第一列为基因名称，除了第一列以外，其他的每一列为一个细胞barcode。每一行为某个基因在所有细胞总的表达情况，对应每个数字为该基因在该细胞中的表达量。

表达矩阵文件

具体提取脚本，会有另外文章说明，这里不再概述。

提取表达量文件后，重新按照pipeline进行分析，得到聚类结果等。

重新从头进行聚类结果

特别说明：

上述主要是针对单个样品的亚群细分分析，如果是有比较差异分析的话，还是需要提前表达矩阵或者S4对象，重新聚类、差异分析，这里是官方GitHub回复意见：

You can certainly subset your data, and recalculate Variable Genes, scale, run PCA, and cluster.
Note that you can set ident.use = c(0, 1) to subset two clusters.
satijalab Re-clustering of given clusters #752