Literature Review(1): A short bu

Literature Review(1): A short bu

作者: Bio_Infor | 来源:发表于2021-08-03 12:57 被阅读0次

最近又有大佬发Nature Protocols了,题目为Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods,主要讲的是怎样做好单细胞的细胞类型注释工作,包括自动注释、手动标注以及最后的验证3个步骤。不过,我却在这篇文章中的一个box里面发现了他们对tSNE和UMAP的理解,以下为原文:

An scRNA-seq data set is typically visualized as a 2D scatter plot where cells (points) with similar transcriptomes are placed near each other. This 2D representation is projected from a higher dimensional space where each cell is described by the expression of thousands of genes, each of which is considered a separate dimension. The three most popular projection methods used for scRNA-seq data are t-SNE, UMAP and PCA.

t-SNE (Fig. 6c) is a nonlinear projection that preserves local groups of similar cells, while equalizing the density of cells within each group. The scale of a ‘local group’ is controlled by the ‘perplexity’ parameter, with higher values creating larger local groups. t-SNE effectively visualizes distinct robust clusters, making it easy to observe discrete cell types; however, global relationships between cell types are not maintained, and thus cluster-to-cluster relationships cannot be inferred and may be misleading. Cell subtypes can be combined into one large cluster or split into distinct plot regions depending on the perplexity.

UMAP (Extended Data Fig. 1) is a nonlinear projection method that differentiates discrete cell clusters20. UMAP is typically regarded as better for visualizing global relationships and gradients than t-SNE, although these differences are probably due to default parameters. UMAP is often less computationally intensive to run than t-SNE.

PCA (Fig. 6b) performs a linear transformation of normalized and scaled scRNA-seq data, to identify independent principal components (PCs) that capture major axes of variation in the data, which could represent biological factors, like cell types and states, or technical factors. PCs are ranked in decreasing order of variance, and typically the first two PCs are used to visualize the data, but more can be considered to detect more subtle expression patterns between cells. PCA can be useful for visualizing cell gradients and states.

Although these methods visually group similar cells and help visualize clusters, they do not define clusters and, therefore, are not clustering algorithms. Cell-clustering algorithm output is typically visualized as colors on the plot, and these colors may or may not correspond to patterns observed in the 2D plot.

其中的两张图(Fig. 6b, 6c and Extended Data Fig. 1)如下:

Fig. 6b Fig. 6c Extended Data Fig. 1







    本文标题:Literature Review(1): A short bu
