scater
包官网:Single-Cell Analysis Toolkit for Gene Expression Data in R.
目前版本为1.18.6 (Revised: February 4, 2020).
4 Dimensionality reduction
4.1 Principal components analysis(PCA)
example_sce <- runPCA(example_sce)
str(reducedDim(example_sce, "PCA"))
## num [1:3005, 1:50] -15.4 -15 -17.2 -16.9 -18.4 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:3005] "1772071015_C02" "1772071017_G12" "1772071017_A05" "1772071014_B06" ...
## ..$ : chr [1:50] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "varExplained")= num [1:50] 478 112.8 51.1 47 33.2 ...
## - attr(*, "percentVar")= num [1:50] 39.72 9.38 4.25 3.9 2.76 ...
## - attr(*, "rotation")= num [1:500, 1:50] 0.1471 0.1146 0.1084 0.0958 0.0953 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:500] "Plp1" "Trf" "Mal" "Apod" ...
## .. ..$ : chr [1:50] "PC1" "PC2" "PC3" "PC4" ...
By default, runPCA()
uses the top 500 genes with the highest variances to compute the first PCs. This can be tuned by specifying subset_row
to pass in an explicit set of genes of interest, and by using ncomponents
to determine the number of components to compute. The name
argument can also be used to change the name of the result in the reducedDims
slot.
example_sce <- runPCA(example_sce, name="PCA2",
subset_row=rownames(example_sce)[1:1000],
ncomponents=25)
str(reducedDim(example_sce, "PCA2"))
## num [1:3005, 1:25] -20 -21 -23 -23.7 -21.5 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:3005] "1772071015_C02" "1772071017_G12" "1772071017_A05" "1772071014_B06" ...
## ..$ : chr [1:25] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "varExplained")= num [1:25] 153 35 23.5 11.6 10.8 ...
## - attr(*, "percentVar")= num [1:25] 22.3 5.11 3.42 1.69 1.58 ...
## - attr(*, "rotation")= num [1:1000, 1:25] -2.24e-04 -8.52e-05 -2.43e-02 -5.92e-04 -6.35e-03 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:1000] "Tspan12" "Tshz1" "Fnbp1l" "Adamts15" ...
## .. ..$ : chr [1:25] "PC1" "PC2" "PC3" "PC4" ...
4.2 Other dimensionality reduction methods
We strongly recommend generating plots with different random seeds and perplexity values, to ensure that any conclusions are robust to different visualizations.
# Perplexity of 10 just chosen here arbitrarily.
set.seed(1000)
example_sce <- runTSNE(example_sce, perplexity=10)
head(reducedDim(example_sce, "TSNE"))
## [,1] [,2]
## 1772071015_C02 -51.11740 -11.311505
## 1772071017_G12 -54.09988 -10.315597
## 1772071017_A05 -50.74363 -11.246481
## 1772071014_B06 -53.65486 -10.098620
## 1772067065_H06 -53.25346 -9.751226
## 1772071017_E02 -54.22199 -9.824003
A more common pattern involves using the pre-existing PCA results as input into the t-SNE algorithm.
set.seed(1000)
example_sce <- runTSNE(example_sce, perplexity=50,
dimred="PCA", n_dimred=10)
head(reducedDim(example_sce, "TSNE"))
example_sce <- runUMAP(example_sce)
head(reducedDim(example_sce, "UMAP"))
## [,1] [,2]
## 1772071015_C02 -12.04324 -2.072196
## 1772071017_G12 -12.11942 -2.135925
## 1772071017_A05 -11.93046 -2.005134
## 1772071014_B06 -12.08696 -2.123133
## 1772067065_H06 -12.11039 -2.148298
## 1772071017_E02 -12.10755 -2.138869
4.3 Visualizing reduced dimensions
plotReducedDim(example_sce, dimred = "PCA", colour_by = "level1class")
image.png
plotTSNE(example_sce, colour_by = "Snap25")
image.png
plotPCA(example_sce, colour_by="Mog")
image.png
example_sce <- runPCA(example_sce, ncomponents=20)
plotPCA(example_sce, ncomponents = 4, colour_by = "level1class")
此图和官网教程稍有出入.png
5 Utilities for custom visualization
ggcells(example_sce, mapping=aes(x=level1class, y=Snap25)) +
geom_boxplot() +
facet_wrap(~tissue)
image.png
ggcells(example_sce, mapping=aes(x=TSNE.1, y=TSNE.2, colour=Snap25)) +
geom_point() +
stat_density_2d() +
facet_wrap(~tissue) +
scale_colour_distiller(direction=1)
image.png
ggcells(example_sce, mapping=aes(x=sizeFactor, y=Actb)) +
geom_point() +
geom_smooth()
image.png
table(rowData(example_sce))
# endogenous mito
# 19972 34
colnames(example_sce) <- make.names(colnames(example_sce))
ggfeatures(example_sce, mapping=aes(x=featureType, y=X1772062111_E06)) +
geom_violin()
image.png
example_sce
# class: SingleCellExperiment
# dim: 20006 3005
# metadata(0):
# assays(2): counts logcounts
# rownames(20006): Tspan12 Tshz1 ... mt-Rnr1 mt-Nd4l
# rowData names(1): featureType
# colnames(3005): X1772071015_C02 X1772071017_G12 ... X1772066098_A12 X1772058148_F03
# colData names(27): tissue group # ... total sizeFactor
# reducedDimNames(4): PCA PCA2 TSNE UMAP
# altExpNames(2): ERCC repeat
altExpNames(example_sce)
# [1] "ERCC" "repeat"
网友评论