scater
包官网地址:Single-Cell Analysis Toolkit for Gene Expression Data in R.
目前版本为1.18.6 (Revised: February 4, 2020).
1 Introduction
scater
is based on the SingleCellExperiment
class (from the SingleCellExperiment package), and thus is interoperable with many other Bioconductor packages such as scran, scuttle and iSEE.
library(scRNAseq)
example_sce <- ZeiselBrainData() # 加载scRNAseq包里的经典数据集
example_sce
## class: SingleCellExperiment
## dim: 20006 3005
## metadata(0):
## assays(1): counts
## rownames(20006): Tspan12 Tshz1 ... mt-Rnr1 mt-Nd4l
## rowData names(1): featureType
## colnames(3005): 1772071015_C02 1772071017_G12 ... 1772066098_A12
## 1772058148_F03
## colData names(10): tissue group # ... level1class level2class
## reducedDimNames(0):
## altExpNames(2): ERCC repeat
Explore the data
table(example_sce@colData$level1class)
# astrocytes_ependymal endothelial-mural interneurons microglia
# 224 235 290 98
# oligodendrocytes pyramidal CA1 pyramidal SS
# 820 939 399
table(example_sce@colData$level2class)
# (none) Astro1 Astro2 CA1Pyr1 CA1Pyr2 CA1PyrInt CA2Pyr2 Choroid
# 189 68 61 380 447 49 41 10
# ClauPyr Epend Int1 Int10 Int11 Int12 Int13 Int14
# 5 20 12 21 10 21 15 22
# Int15 Int16 Int2 Int3 Int4 Int5 Int6 Int7
# 18 20 24 10 15 20 22 23
# Int8 Int9 Mgl1 Mgl2 Oligo1 Oligo2 Oligo3 Oligo4
# 26 11 17 16 45 98 87 106
# Oligo5 Oligo6 Peric Pvm1 Pvm2 S1PyrDL S1PyrL23 S1PyrL4
# 125 359 21 32 33 81 74 26
# S1PyrL5 S1PyrL5a S1PyrL6 S1PyrL6b SubPyr Vend1 Vend2 Vsmc
# 16 28 39 21 22 32 105 62
2 Diagnostic plots for quality control
library(scater)
example_sce <- addPerCellQC(example_sce,
subsets=list(Mito=grep("mt-", rownames(example_sce))))
print(colnames(example_sce@colData))
# [1] "tissue" "group #" "total mRNA mol"
# [4] "well" "sex" "age"
# [7] "diameter" "cell_id" "level1class"
# [10] "level2class" "sum" "detected"
# [13] "percent_top_50" "percent_top_100" "percent_top_200"
# [16] "percent_top_500" "subsets_Mito_sum" "subsets_Mito_detected"
# [19] "subsets_Mito_percent" "altexps_ERCC_sum" "altexps_ERCC_detected"
# [22] "altexps_ERCC_percent" "altexps_repeat_sum" "altexps_repeat_detected"
# [25] "altexps_repeat_percent" "total" "sizeFactor"
Metadata variables can be plotted against each other using the plotColData() function:
plotColData(example_sce, x = "sum", y="detected", colour_by="tissue")
image.png
plotColData(example_sce, x = "sum", y="subsets_Mito_percent",
other_fields="tissue") + facet_wrap(~tissue)
image.png
# plotHighestExprs(example_sce, exprs_values = "counts") #这一步计算量较大,应保存为pdf,不然会报错
p1 <- plotHighestExprs(example_sce, exprs_values = "counts")
ggsave("plotHighestExprs.pdf", plot = p1, width = 15, height = 18)
image.png
Each row in the plot below corresponds to a gene; each bar corresponds to the expression of a gene in a single cell; and the circle indicates the median expression of each gene, with which genes are sorted.
Variable-level metrics are computed by the getVarianceExplained()
function (after normalization, see below). This calculates the percentage of variance of each gene’s expression that is explained by each variable in the colData
of the SingleCellExperiment
object. We can then use this to determine which experimental factors are contributing most to the variance in expression. This is useful for diagnosing batch effects or to quickly verify that a treatment has an effect.
# Computing variance explained on the log-counts,
# so that the statistics reflect changes in relative expression.
example_sce <- logNormCounts(example_sce)
vars <- getVarianceExplained(example_sce,
variables=c("tissue", "total mRNA mol", "sex", "age"))
head(vars)
## tissue total mRNA mol sex age
## Tspan12 0.02207262 0.074086504 0.146344996 0.09472155
## Tshz1 3.36083014 0.003846487 0.001079356 0.31262288
## Fnbp1l 0.43597185 0.421086301 0.003071630 0.64964174
## Adamts15 0.54233888 0.005348505 0.030821621 0.01393787
## Cldn12 0.03506751 0.309128294 0.008341408 0.02363737
## Rxfp1 0.18559637 0.016290703 0.055646799 0.02128006
plotExplanatoryVariables(vars)
image.png
3 Visualizing expression values
plotExpression
function can be particularly useful for further examination of features identified from differential expression testing, pseudotime analysis or other analyses. By default, it uses expression values in the "logcounts" assay, but this can be changed through the exprs_values argument.
plotExpression(example_sce, rownames(example_sce)[1:6], x = "level1class")
image.png
continuous covariates will generate a scatter plot in each panel:
plotExpression(example_sce, rownames(example_sce)[1:6],
x = rownames(example_sce)[10])
image.png
The points can also be coloured, shaped or resized by the column metadata or expression values.
plotExpression(example_sce, rownames(example_sce)[1:6],
x = "level1class", colour_by="tissue")
image.png
Directly plotting the gene expression without any x or other visual parameters:
plotExpression(example_sce, rownames(example_sce)[1:6])
image.png
网友评论